Abstract
Hey there!! Welcome to the Comprehensive R-Tutorial. WHAT THIS IS: An ever-expanding notebook of how to use R for various data-related activities. Because we are always learning (yay!), this notebook is never finished. WHAT THIS IS NOT: A statistics class!!! You will find this workshop the most useful if you have taken a prior basic stats class and/or done your own research. While we will do our best to explain basic concepts, stats is a very large field and we are not experts :) ALSO NOTE: all of the data in this is FAKE DATA. While the participants, groups, and tasks may look like they are from our study, the data is completely randomized by a program in matlab. Additionaly, if you feel that any of the below explanations are incorrect, don’t make sense, or need to be elaborated upon, PLEASE LET US KNOW! xoxo, SAP#Here are a bunch of libraries that you may or may not need
library(agricolae) #used for simple effects
library(apaTables) #makes apa-formatted tables
library(broom) #converts objects so they can be used by tidy and ggplot
library(devtools) #tbh i dont know, but i think it helps with other packages
library(e1071) # for skew stats
library(effects) #glm stuff, i think
library(ez) #anovas, rm anovas
library(GGally) #addon for GGplot2
library(ggmap) #maps
library(ggplot2) #the prettiest graphs you ever did see
library(gridExtra) #nice tables for grobs
library(hexbin) #density graphs
library(HH) #i dont remember!!
library(Hmisc) #arranging grobs, writing things as.numeric etc.
library(kableExtra) #tables for html
library(knitr) #lets you do extra stuff when you knit to html
library(lme4) #linear mixed effects models
library(lsr) #used for etaSquared
library(MBESS) #needed for some apa tables
library(multcomp) # i don't remember!!!
library(nlme) #more linear model stuff (and nonlinear)
library(plyr) #for splitting, combining data
library(psych) #basically all the most basic stats in r
library(purrr) #more layout/visual stuff for grobs
library(RColorBrewer) #makes everything beautiful and colorful
library(readxl) #lets you import from excel
library(reshape2) #helps you reshape data, put in longform, etc
library(rgeos) #maps
library(rnaturalearth) #maps
library(rnaturalearthdata) #maps
library(scales) #for making those color mapped correlations
library(sf) #maps
library(stargazer) #makes nice tables for sweave,html
library(texreg) #puts output to html tables
library(tibble) #colnames and rownames. also good for grobs
library(tidyr) #more data manipulation stuff
library(xtable) # i dont remember, again!
library(lmerTest)
library(robustlmm)
library(WRS) # Rand Wilcox's package for robust stats
library(WRS2)
library(esc) #effect size calculators
#for whatever reasaon, this package needs to be loaded last
library(dplyr) #helps with summary tablesIn its most simple form, you can use R as a glorified calculator. TO run a line, hit “COMMAND ENTER”. To run a whole chunk, you can press the green play button in the top right.
Arithmetic
## [1] 3
## [1] 3
## [1] 6
Order of operations
## [1] 12.68268
Creating lists. Use “c”
how long is your list?
## [1] 8
indexing- grab a single value from a list, or multiple values
## [1] 1
## [1] 2 13 6 7 8
## [1] 5 4 1 13 6 7 8
rbind vs. cbind: what is the difference?
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 1 2 3 4 5 6 7 8 9 10
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## a 300 4 1 2 13 6 7 8 5 20
## b 1 2 3 4 5 6 7 8 9 10
## a b
## [1,] 300 1
## [2,] 4 2
## [3,] 1 3
## [4,] 2 4
## [5,] 13 5
## [6,] 6 6
## [7,] 7 7
## [8,] 8 8
## [9,] 5 9
## [10,] 20 10
You gotta gotta gotta know how to manipulate your data, or else you will spend HOURS trying to figure out why your stats aren’t running and why your graphs look like abstract art.
Note: this dataset is for the Flanker Fish task from the executive function battery. It includes: -3 groups (M,S,C) -2 Years -3 Blocks/Year -3 Conditions/Block -Accuracy, RT
Additional variables are things like gender, age, WASI, etc. that we will use in different tests.
Again, this is fake data!!!!!
Something pretty important in R is being able to look at your data quickly.
## # A tibble: 6 x 55
## ID Group Age Gender WASI Y4_Total_AccSta… Y4_Incongruent_…
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 MUSI… 1 83 M 108 0.582 0.980
## 2 MUSI… 1 80 F 128 0.748 0.279
## 3 MUSI… 1 88 F 66 0.927 0.843
## 4 MUSI… 1 81 F 113 0.681 0.777
## 5 MUSI… 1 78 M 105 0.0506 0.391
## 6 MUSI… 1 79 M 80 0.834 0.643
## # … with 48 more variables: Y4_Neutral_AccStandard <dbl>,
## # Y4_Congruent_AccStandard <dbl>, Y4_Incongruent_RTStandard <dbl>,
## # Y4_Neutral_RTStandard <dbl>, Y4_Congruent_RTStandard <dbl>,
## # Y4_Total_RTStandard <dbl>, Y4_acc_Reversed <dbl>,
## # Y4_acc_I_Reversed <dbl>, Y4_acc_N_Reversed <dbl>,
## # Y4_acc_C_Reversed <dbl>, Y4_RT_I_Reversed <dbl>,
## # Y4_RT_N_Reversed <dbl>, Y4_RT_C_Reversed <dbl>, Y4_RT_Reversed <dbl>,
## # Y4_acc_Mixed <dbl>, Y4_acc_I_Mixed <dbl>, Y4_acc_N_Mixed <dbl>,
## # Y4_acc_C_Mixed <dbl>, Y4_RT_I_Mixed <dbl>, Y4_RT_N_Mixed <dbl>,
## # Y4_RT_C_Mixed <dbl>, Y4_RT_Mixed <dbl>, Y4_Switch_Cost_ <dbl>,
## # Y5_Total_AccStandard <dbl>, Y5_Incongruent_AccStandard <dbl>,
## # Y5_Neutral_AccStandard <dbl>, Y5_Congruent_AccStandard <dbl>,
## # Y5_Incongruent_RTStandard <dbl>, Y5_Neutral_RTStandard <dbl>,
## # Y5_Congruent_RTStandard <dbl>, Y5_Total_RTStandard <dbl>,
## # Y5_acc_Reversed <dbl>, Y5_acc_I_Reversed <dbl>,
## # Y5_acc_N_Reversed <dbl>, Y5_acc_C_Reversed <dbl>,
## # Y5_RT_I_Reversed <dbl>, Y5_RT_N_Reversed <dbl>,
## # Y5_RT_C_Reversed <dbl>, Y5_RT_Reversed <dbl>, Y5_acc_Mixed <dbl>,
## # Y5_acc_I_Mixed <dbl>, Y5_acc_N_Mixed <dbl>, Y5_acc_C_Mixed <dbl>,
## # Y5_RT_I_Mixed <dbl>, Y5_RT_N_Mixed <dbl>, Y5_RT_C_Mixed <dbl>,
## # Y5_RT_Mixed <dbl>, Y5_Switch_Cost <dbl>
## [1] "ID" "Group"
## [3] "Age" "Gender"
## [5] "WASI" "Y4_Total_AccStandard"
## [7] "Y4_Incongruent_AccStandard" "Y4_Neutral_AccStandard"
## [9] "Y4_Congruent_AccStandard" "Y4_Incongruent_RTStandard"
## [11] "Y4_Neutral_RTStandard" "Y4_Congruent_RTStandard"
## [13] "Y4_Total_RTStandard" "Y4_acc_Reversed"
## [15] "Y4_acc_I_Reversed" "Y4_acc_N_Reversed"
## [17] "Y4_acc_C_Reversed" "Y4_RT_I_Reversed"
## [19] "Y4_RT_N_Reversed" "Y4_RT_C_Reversed"
## [21] "Y4_RT_Reversed" "Y4_acc_Mixed"
## [23] "Y4_acc_I_Mixed" "Y4_acc_N_Mixed"
## [25] "Y4_acc_C_Mixed" "Y4_RT_I_Mixed"
## [27] "Y4_RT_N_Mixed" "Y4_RT_C_Mixed"
## [29] "Y4_RT_Mixed" "Y4_Switch_Cost_"
## [31] "Y5_Total_AccStandard" "Y5_Incongruent_AccStandard"
## [33] "Y5_Neutral_AccStandard" "Y5_Congruent_AccStandard"
## [35] "Y5_Incongruent_RTStandard" "Y5_Neutral_RTStandard"
## [37] "Y5_Congruent_RTStandard" "Y5_Total_RTStandard"
## [39] "Y5_acc_Reversed" "Y5_acc_I_Reversed"
## [41] "Y5_acc_N_Reversed" "Y5_acc_C_Reversed"
## [43] "Y5_RT_I_Reversed" "Y5_RT_N_Reversed"
## [45] "Y5_RT_C_Reversed" "Y5_RT_Reversed"
## [47] "Y5_acc_Mixed" "Y5_acc_I_Mixed"
## [49] "Y5_acc_N_Mixed" "Y5_acc_C_Mixed"
## [51] "Y5_RT_I_Mixed" "Y5_RT_N_Mixed"
## [53] "Y5_RT_C_Mixed" "Y5_RT_Mixed"
## [55] "Y5_Switch_Cost"
To check how many rows/columns you have, you can use the length function
## [1] 51
## [1] 55
Also important is the class of your variables.
Class is imporant because it determines what kinds of stuff you can do with a variable. Factoring gets R to recoognize something as categorical, for example.
To change the class of a variable, simply write:
#dataframe$variable = as.NEWCLASS(dataframe$variable)
#to look at the class of one variable
class(flanker$Gender)## [1] "character"
## $ID
## [1] "character"
##
## $Group
## [1] "numeric"
##
## $Age
## [1] "numeric"
##
## $Gender
## [1] "character"
##
## $WASI
## [1] "numeric"
##
## $Y4_Total_AccStandard
## [1] "numeric"
##
## $Y4_Incongruent_AccStandard
## [1] "numeric"
##
## $Y4_Neutral_AccStandard
## [1] "numeric"
##
## $Y4_Congruent_AccStandard
## [1] "numeric"
##
## $Y4_Incongruent_RTStandard
## [1] "numeric"
##
## $Y4_Neutral_RTStandard
## [1] "numeric"
##
## $Y4_Congruent_RTStandard
## [1] "numeric"
##
## $Y4_Total_RTStandard
## [1] "numeric"
##
## $Y4_acc_Reversed
## [1] "numeric"
##
## $Y4_acc_I_Reversed
## [1] "numeric"
##
## $Y4_acc_N_Reversed
## [1] "numeric"
##
## $Y4_acc_C_Reversed
## [1] "numeric"
##
## $Y4_RT_I_Reversed
## [1] "numeric"
##
## $Y4_RT_N_Reversed
## [1] "numeric"
##
## $Y4_RT_C_Reversed
## [1] "numeric"
##
## $Y4_RT_Reversed
## [1] "numeric"
##
## $Y4_acc_Mixed
## [1] "numeric"
##
## $Y4_acc_I_Mixed
## [1] "numeric"
##
## $Y4_acc_N_Mixed
## [1] "numeric"
##
## $Y4_acc_C_Mixed
## [1] "numeric"
##
## $Y4_RT_I_Mixed
## [1] "numeric"
##
## $Y4_RT_N_Mixed
## [1] "numeric"
##
## $Y4_RT_C_Mixed
## [1] "numeric"
##
## $Y4_RT_Mixed
## [1] "numeric"
##
## $Y4_Switch_Cost_
## [1] "numeric"
##
## $Y5_Total_AccStandard
## [1] "numeric"
##
## $Y5_Incongruent_AccStandard
## [1] "numeric"
##
## $Y5_Neutral_AccStandard
## [1] "numeric"
##
## $Y5_Congruent_AccStandard
## [1] "numeric"
##
## $Y5_Incongruent_RTStandard
## [1] "numeric"
##
## $Y5_Neutral_RTStandard
## [1] "numeric"
##
## $Y5_Congruent_RTStandard
## [1] "numeric"
##
## $Y5_Total_RTStandard
## [1] "numeric"
##
## $Y5_acc_Reversed
## [1] "numeric"
##
## $Y5_acc_I_Reversed
## [1] "numeric"
##
## $Y5_acc_N_Reversed
## [1] "numeric"
##
## $Y5_acc_C_Reversed
## [1] "numeric"
##
## $Y5_RT_I_Reversed
## [1] "numeric"
##
## $Y5_RT_N_Reversed
## [1] "numeric"
##
## $Y5_RT_C_Reversed
## [1] "numeric"
##
## $Y5_RT_Reversed
## [1] "numeric"
##
## $Y5_acc_Mixed
## [1] "numeric"
##
## $Y5_acc_I_Mixed
## [1] "numeric"
##
## $Y5_acc_N_Mixed
## [1] "numeric"
##
## $Y5_acc_C_Mixed
## [1] "numeric"
##
## $Y5_RT_I_Mixed
## [1] "numeric"
##
## $Y5_RT_N_Mixed
## [1] "numeric"
##
## $Y5_RT_C_Mixed
## [1] "numeric"
##
## $Y5_RT_Mixed
## [1] "numeric"
##
## $Y5_Switch_Cost
## [1] "numeric"
Attributes is basically everything you might want to know about how R recognizes your data. You can do it for an entire dataframe, or for a single variable
## NULL
Factoring is necessary in order for R to recognize things as different categories rather than individual characters, numbers, etc.
flanker$Group = as.factor(flanker$Group)
flanker$Gender = as.factor(flanker$Gender)
as.factor(flanker$Group)## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3
## [36] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2
## Levels: 1 2 3
## [1] "factor"
## $levels
## [1] "1" "2" "3"
##
## $class
## [1] "factor"
## $names
## [1] "ID" "Group"
## [3] "Age" "Gender"
## [5] "WASI" "Y4_Total_AccStandard"
## [7] "Y4_Incongruent_AccStandard" "Y4_Neutral_AccStandard"
## [9] "Y4_Congruent_AccStandard" "Y4_Incongruent_RTStandard"
## [11] "Y4_Neutral_RTStandard" "Y4_Congruent_RTStandard"
## [13] "Y4_Total_RTStandard" "Y4_acc_Reversed"
## [15] "Y4_acc_I_Reversed" "Y4_acc_N_Reversed"
## [17] "Y4_acc_C_Reversed" "Y4_RT_I_Reversed"
## [19] "Y4_RT_N_Reversed" "Y4_RT_C_Reversed"
## [21] "Y4_RT_Reversed" "Y4_acc_Mixed"
## [23] "Y4_acc_I_Mixed" "Y4_acc_N_Mixed"
## [25] "Y4_acc_C_Mixed" "Y4_RT_I_Mixed"
## [27] "Y4_RT_N_Mixed" "Y4_RT_C_Mixed"
## [29] "Y4_RT_Mixed" "Y4_Switch_Cost_"
## [31] "Y5_Total_AccStandard" "Y5_Incongruent_AccStandard"
## [33] "Y5_Neutral_AccStandard" "Y5_Congruent_AccStandard"
## [35] "Y5_Incongruent_RTStandard" "Y5_Neutral_RTStandard"
## [37] "Y5_Congruent_RTStandard" "Y5_Total_RTStandard"
## [39] "Y5_acc_Reversed" "Y5_acc_I_Reversed"
## [41] "Y5_acc_N_Reversed" "Y5_acc_C_Reversed"
## [43] "Y5_RT_I_Reversed" "Y5_RT_N_Reversed"
## [45] "Y5_RT_C_Reversed" "Y5_RT_Reversed"
## [47] "Y5_acc_Mixed" "Y5_acc_I_Mixed"
## [49] "Y5_acc_N_Mixed" "Y5_acc_C_Mixed"
## [51] "Y5_RT_I_Mixed" "Y5_RT_N_Mixed"
## [53] "Y5_RT_C_Mixed" "Y5_RT_Mixed"
## [55] "Y5_Switch_Cost"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50 51
##
## $class
## [1] "tbl_df" "tbl" "data.frame"
This creates new dataframes. This can be helpful when running analysis, or just if you want to see summaries between groups very quickly
THERE IS A WRONG AND A RIGHT WAY TO DO THIS.
HERE IS THE WRONG WAY (I’m not even going to put it in real code because it is THAT BAD)
If you do the above, you are DOOMED to fail. You are just relabelling your data and swapping the NAMES of your factors, not the ORDER
HERE IS THE RIGHT WAY:
## # A tibble: 1 x 1
## Gender
## <fct>
## 1 M
NA’s cause lots of problems in analysis. Lots of stats will not run if you have NA’s.
## [1] 51
## [1] 48
## # A tibble: 51 x 55
## ID Group Age Gender WASI Y4_Total_AccSta… Y4_Incongruent_…
## <chr> <fct> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 MUSI… Music 83 F 108 0.582 0.980
## 2 MUSI… Music 80 F 128 0.748 0.279
## 3 MUSI… Music 88 F 66 0.927 0.843
## 4 MUSI… Music 81 F 113 0.681 0.777
## 5 MUSI… Music 78 M 105 0.0506 0.391
## 6 MUSI… Music 79 M 80 0.834 0.643
## 7 MUSI… Music 86 F 93 0.846 0.071
## 8 MUSI… Music 86 F 105 0.961 0.361
## 9 MUSI… Music 81 F 154 0.340 0.686
## 10 MUSI… Music 84 F 142 0.0514 0.164
## # … with 41 more rows, and 48 more variables:
## # Y4_Neutral_AccStandard <dbl>, Y4_Congruent_AccStandard <dbl>,
## # Y4_Incongruent_RTStandard <dbl>, Y4_Neutral_RTStandard <dbl>,
## # Y4_Congruent_RTStandard <dbl>, Y4_Total_RTStandard <dbl>,
## # Y4_acc_Reversed <dbl>, Y4_acc_I_Reversed <dbl>,
## # Y4_acc_N_Reversed <dbl>, Y4_acc_C_Reversed <dbl>,
## # Y4_RT_I_Reversed <dbl>, Y4_RT_N_Reversed <dbl>,
## # Y4_RT_C_Reversed <dbl>, Y4_RT_Reversed <dbl>, Y4_acc_Mixed <dbl>,
## # Y4_acc_I_Mixed <dbl>, Y4_acc_N_Mixed <dbl>, Y4_acc_C_Mixed <dbl>,
## # Y4_RT_I_Mixed <dbl>, Y4_RT_N_Mixed <dbl>, Y4_RT_C_Mixed <dbl>,
## # Y4_RT_Mixed <dbl>, Y4_Switch_Cost_ <dbl>, Y5_Total_AccStandard <dbl>,
## # Y5_Incongruent_AccStandard <dbl>, Y5_Neutral_AccStandard <dbl>,
## # Y5_Congruent_AccStandard <dbl>, Y5_Incongruent_RTStandard <dbl>,
## # Y5_Neutral_RTStandard <dbl>, Y5_Congruent_RTStandard <dbl>,
## # Y5_Total_RTStandard <dbl>, Y5_acc_Reversed <dbl>,
## # Y5_acc_I_Reversed <dbl>, Y5_acc_N_Reversed <dbl>,
## # Y5_acc_C_Reversed <dbl>, Y5_RT_I_Reversed <dbl>,
## # Y5_RT_N_Reversed <dbl>, Y5_RT_C_Reversed <dbl>, Y5_RT_Reversed <dbl>,
## # Y5_acc_Mixed <dbl>, Y5_acc_I_Mixed <dbl>, Y5_acc_N_Mixed <dbl>,
## # Y5_acc_C_Mixed <dbl>, Y5_RT_I_Mixed <dbl>, Y5_RT_N_Mixed <dbl>,
## # Y5_RT_C_Mixed <dbl>, Y5_RT_Mixed <dbl>, Y5_Switch_Cost <dbl>
## [1] 50
You can also call by variable name.
y3 = read_excel("/Volumes/MusicProject/Individual_Projects/R/Comprehensive_R/Sample_Data/y3standard.xlsx")
#lets just manipulate it the same way, for simplicity
y3[1,4] = 'F'
y3 = y3[-1,]
flanker_standard = merge(flanker_standard, y3, by = "ID")
#do this stuff again
flanker_standard$Group = as.factor(flanker_standard$Group)
music_st = flanker_standard[which(flanker_standard$Group == '1'),]
sport_st = flanker_standard[which(flanker_standard$Group == '2'),]
control_st = flanker_standard[which(flanker_standard$Group == '3'),]Great. Now you have some nice data. But is it really what you want? What about outliers? Lets go ahead and figure out what those are and decide if we want to remove them.
Let’s do it by group.
Ok. This is great and all, but you can’t really do very many stats on it, because it is in wide format.
Please note: this dataset was organized very carefully (only for the Standard block, ha ha), so that the Underscores separated the different variables. For example:
Y4_Total_AccStandard will create 3 rows: Year = 4, Condition= Total, Var = AccStandard
There are other ways to do this, like many things in R, but this appears to be the most efficient IF you set up your headers correctly
flanker_long <- flanker_standardno %>%
gather(key = var, value = val, -(ID:WASI)) %>% #gather by these, exclude columns ID through WASI $
separate(var, c("Year","Condition","var")) %>% #separate by these (what your new columns will be)
spread(key = var, value = val)
#gotta factor those conditions
flanker_long$Condition = as.factor(flanker_long$Condition)
flanker_long$Year = as.factor(flanker_long$Year)
#for some reason, by values are characters. not ideal
flanker_long$AccStandard = as.numeric(flanker_long$AccStandard)
flanker_long$RTStandard = as.numeric(flanker_long$RTStandard)Another way to make things long, for more simple data, is to use melt. A little more work-intensive
#Lets say you just had a simple data set like this
rt <- read_excel("/Volumes/MusicProject/Individual_Projects/R/Comprehensive_R/Sample_Data/Simple.xlsx")
#fix it up
rt$group = as.factor(rt$group)
rt$year = as.factor(rt$year)
rt$ID = as.factor(rt$ID)
rtlong = melt(data = rt, id = c('ID','group', "year"))MAJOR KEY!!!!!: BEFORE YOU RENAME YOUR COLUMNS, CHECK TO MAKE SURE THEY ARE IN THE RIGHT ORDER!!!!!! OTHERWISE, VAST CONFUSION MAY OCCUR
Ok, but there is one more problem: we can’t really look at results across years with simple RMANOVA unless all the participants have data for each year. So, we remove those that don’t have all years. This can be done with the na. omit thing when in WIDE form, or it can be done by removing subjects that don’t have more than x number of iterations of their ID (if you already have long form data)
Let’s go back to our original dataset (flanker_long)
The “>” term should be one less than your max number of iterations. So, I want each participant to have 12 data points, in this case (4 for each of the 3 years (total, incongruent, congruent, neutral)). If someone has less than 12, they are OUT.
flanker_long = flanker_long[flanker_long$ID %in% names(table(flanker_long$ID)) [table(flanker_long$ID) >11],]
# so basically: flanker_long = only those where the participant shows up more than 11 times, because that means they have all data points for each yearONE LAST THING. the long data includes something called “total”. we don’t want that because that is just a sum of everythign else! lets remove it.
Cool! Let’s make a summary table now. These are helpful when we go to make graphs (later on), but they are also just generally nice to look at your summary stats
flankersum_acc = ddply(flanker_long, c("Group", "Condition", "Year"), summarise,
N = length(AccStandard), # calculates # of individuals per group by year
mean = mean(AccStandard), # calculates mean for each group by year
sd = sd(AccStandard), # calculates sd for each group by year
se = sd / sqrt(N))
flankersum_rt = ddply(flanker_long, c("Group", "Condition", "Year"), summarise,
N = length(RTStandard),
mean = mean(RTStandard),
sd = sd(RTStandard),
se = sd / sqrt(N))Woohoo! We are now moving on to statistics! Below are some very basic stats tests you might want to run.
Which test do I use?? Excellent question. Here is a nice website with detail and R/SPSS code, that you can use to make those decisions.
Additionally, below are two types of decision trees that are helpful.
It should be noted that the images below outline pretty old methods of statistics that are not super robust, meaning you may compromise some statistical power and potentially get some misleading results. As noted above, this is not a statistics course, so we will not be going into the math behind such arguments. Please use the images below with caution, as a place to start. There are lots of other more robust methods to carry out all of these analyses. We recommend this book by Rand Wilcox.
Tests for Association
Tests for Differences
You can also…. look at your summary spreadsheets!
## ID Group Age Gender WASI
## Length:423 Music :144 Min. :77.00 F:189 Min. : 56.0
## Class :character Sport :135 1st Qu.:80.00 M:234 1st Qu.: 89.0
## Mode :character Control:144 Median :82.00 Median :105.0
## Mean :82.11 Mean :103.7
## 3rd Qu.:84.00 3rd Qu.:113.0
## Max. :88.00 Max. :154.0
## Year Condition AccStandard RTStandard
## Y3:141 Congruent :141 Min. :0.0088 Min. : 361.0
## Y4:141 Incongruent:141 1st Qu.:0.3846 1st Qu.: 528.0
## Y5:141 Neutral :141 Median :0.6823 Median : 621.4
## Total : 0 Mean :0.6263 Mean : 635.2
## 3rd Qu.:0.9020 3rd Qu.: 734.1
## Max. :1.0000 Max. :1223.3
## vars n mean sd median trimmed mad
## ID* 1 47 NaN NA NA NaN NA
## Group* 2 47 2.00 0.83 2.00 2.00 1.48
## Age 3 47 82.11 2.97 82.00 82.05 2.97
## Gender* 4 47 1.55 0.50 2.00 1.56 0.00
## WASI 5 47 103.68 19.44 105.00 103.26 16.31
## Y4_Total_AccStandard 6 47 0.53 0.33 0.63 0.54 0.41
## Y4_Incongruent_AccStandard 7 47 0.49 0.28 0.41 0.48 0.37
## Y4_Neutral_AccStandard 8 47 0.40 0.27 0.38 0.39 0.32
## Y4_Congruent_AccStandard 9 47 0.86 0.09 0.87 0.86 0.12
## Y4_Incongruent_RTStandard 10 47 678.49 157.56 670.83 664.31 126.27
## Y4_Neutral_RTStandard 11 47 566.64 116.73 545.71 560.36 122.84
## Y4_Congruent_RTStandard 12 47 631.51 117.30 632.25 628.67 117.13
## Y4_Total_RTStandard 13 47 658.97 133.12 654.75 651.15 130.84
## Y5_Total_AccStandard 14 47 0.52 0.22 0.54 0.52 0.25
## Y5_Incongruent_AccStandard 15 47 0.65 0.28 0.64 0.67 0.34
## Y5_Neutral_AccStandard 16 47 0.84 0.27 1.00 0.89 0.00
## Y5_Congruent_AccStandard 17 47 0.68 0.29 0.73 0.71 0.40
## Y5_Incongruent_RTStandard 18 47 616.70 113.10 587.67 604.62 85.74
## Y5_Neutral_RTStandard 19 47 498.91 99.73 469.71 485.86 59.52
## Y5_Congruent_RTStandard 20 47 577.40 108.93 563.25 565.63 102.67
## Y5_Total_RTStandard 21 47 593.89 106.20 574.88 582.42 84.42
## Y3_Total_AccStandard 22 47 0.52 0.26 0.54 0.52 0.30
## Y3_Incongruent_AccStandard 23 47 0.52 0.29 0.50 0.52 0.33
## Y3_Neutral_AccStandard* 24 47 0.65 0.32 0.67 0.68 0.38
## Y3_Congruent_AccStandard 25 47 0.54 0.27 0.58 0.54 0.28
## Y3_Incongruent_RTStandard 26 47 738.52 96.65 748.35 741.96 134.08
## Y3_Neutral_RTStandard 27 47 728.52 118.28 743.65 730.92 150.09
## Y3_Congruent_RTStandard 28 47 679.91 123.36 664.67 677.53 165.51
## Y3_Total_RTStandard 29 47 671.95 116.67 672.04 667.23 144.65
## min max range skew kurtosis se
## ID* Inf -Inf -Inf NA NA NA
## Group* 1.00 3.00 2.00 0.00 -1.59 0.12
## Age 77.00 88.00 11.00 0.06 -1.01 0.43
## Gender* 1.00 2.00 1.00 -0.21 -2.00 0.07
## WASI 56.00 154.00 98.00 0.16 0.32 2.84
## Y4_Total_AccStandard 0.00 0.96 0.96 -0.32 -1.46 0.05
## Y4_Incongruent_AccStandard 0.07 0.96 0.90 0.18 -1.41 0.04
## Y4_Neutral_AccStandard 0.01 0.96 0.95 0.22 -1.22 0.04
## Y4_Congruent_AccStandard 0.71 1.00 0.29 -0.11 -1.23 0.01
## Y4_Incongruent_RTStandard 423.50 1223.33 799.83 1.08 1.67 22.98
## Y4_Neutral_RTStandard 361.00 856.43 495.43 0.49 -0.31 17.03
## Y4_Congruent_RTStandard 415.50 967.25 551.75 0.24 -0.12 17.11
## Y4_Total_RTStandard 425.56 1053.12 627.56 0.61 0.26 19.42
## Y5_Total_AccStandard 0.01 0.96 0.95 0.01 -0.66 0.03
## Y5_Incongruent_AccStandard 0.02 1.00 0.98 -0.46 -0.75 0.04
## Y5_Neutral_AccStandard 0.06 1.00 0.94 -1.53 0.86 0.04
## Y5_Congruent_AccStandard 0.01 1.00 0.99 -0.51 -0.82 0.04
## Y5_Incongruent_RTStandard 465.83 1016.50 550.67 1.29 1.96 16.50
## Y5_Neutral_RTStandard 371.29 836.29 465.00 1.40 1.67 14.55
## Y5_Congruent_RTStandard 399.25 910.25 511.00 1.11 1.25 15.89
## Y5_Total_RTStandard 439.69 974.62 534.94 1.36 2.26 15.49
## Y3_Total_AccStandard 0.08 0.98 0.89 0.08 -1.13 0.04
## Y3_Incongruent_AccStandard 0.03 0.99 0.96 -0.11 -1.10 0.04
## Y3_Neutral_AccStandard* 0.02 1.00 0.98 -0.55 -0.96 0.05
## Y3_Congruent_AccStandard 0.02 1.00 0.98 -0.12 -0.97 0.04
## Y3_Incongruent_RTStandard 548.19 898.65 350.46 -0.22 -1.17 14.10
## Y3_Neutral_RTStandard 510.27 899.53 389.26 -0.12 -1.30 17.25
## Y3_Congruent_RTStandard 500.02 890.86 390.84 0.10 -1.43 17.99
## Y3_Total_RTStandard 508.60 876.45 367.85 0.24 -1.28 17.02
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 47 82.11 2.97 82 82.05 2.97 77 88 11 0.06 -1.01 0.43
##
## Descriptive statistics by group
## group: Music
## vars n mean sd median trimmed mad
## ID* 1 16 NaN NA NA NaN NA
## Group* 2 16 1.00 0.00 1.00 1.00 0.00
## Age 3 16 82.31 3.11 82.50 82.29 3.71
## Gender* 4 16 1.38 0.50 1.00 1.36 0.00
## WASI 5 16 108.00 24.54 105.00 107.71 14.83
## Y4_Total_AccStandard 6 16 0.52 0.37 0.71 0.53 0.36
## Y4_Incongruent_AccStandard 7 16 0.47 0.29 0.39 0.46 0.36
## Y4_Neutral_AccStandard 8 16 0.35 0.22 0.34 0.34 0.22
## Y4_Congruent_AccStandard 9 16 0.88 0.09 0.90 0.88 0.09
## Y4_Incongruent_RTStandard 10 16 651.10 145.37 602.17 643.40 97.36
## Y4_Neutral_RTStandard 11 16 547.71 125.41 514.71 538.99 86.10
## Y4_Congruent_RTStandard 12 16 603.94 125.83 605.38 601.62 154.93
## Y4_Total_RTStandard 13 16 634.77 134.15 591.62 631.27 98.55
## Y5_Total_AccStandard 14 16 0.58 0.23 0.55 0.58 0.30
## Y5_Incongruent_AccStandard 15 16 0.80 0.32 1.00 0.84 0.00
## Y5_Neutral_AccStandard 16 16 0.80 0.27 1.00 0.82 0.00
## Y5_Congruent_AccStandard 17 16 0.97 0.09 1.00 0.98 0.00
## Y5_Incongruent_RTStandard 18 16 619.18 133.77 581.50 601.35 121.82
## Y5_Neutral_RTStandard 19 16 503.20 121.38 462.21 488.83 66.82
## Y5_Congruent_RTStandard 20 16 577.23 127.44 567.62 566.16 90.62
## Y5_Total_RTStandard 21 16 596.65 129.59 568.19 580.86 95.30
## Y3_Total_AccStandard 22 16 0.48 0.30 0.45 0.48 0.42
## Y3_Incongruent_AccStandard 23 16 0.65 0.26 0.68 0.67 0.21
## Y3_Neutral_AccStandard* 24 16 0.92 0.14 1.00 0.93 0.00
## Y3_Congruent_AccStandard 25 16 0.62 0.24 0.65 0.63 0.24
## Y3_Incongruent_RTStandard 26 16 750.66 102.78 770.23 756.51 116.85
## Y3_Neutral_RTStandard 27 16 723.42 119.32 750.25 723.69 142.80
## Y3_Congruent_RTStandard 28 16 683.47 131.26 661.24 681.57 177.69
## Y3_Total_RTStandard 29 16 671.29 127.65 686.88 667.94 166.71
## min max range skew kurtosis se
## ID* Inf -Inf -Inf NA NA NA
## Group* 1.00 1.00 0.00 NaN NaN 0.00
## Age 77.00 88.00 11.00 0.04 -1.13 0.78
## Gender* 1.00 2.00 1.00 0.47 -1.89 0.12
## WASI 66.00 154.00 88.00 0.32 -0.85 6.14
## Y4_Total_AccStandard 0.02 0.96 0.94 -0.24 -1.81 0.09
## Y4_Incongruent_AccStandard 0.07 0.96 0.89 0.24 -1.43 0.07
## Y4_Neutral_AccStandard 0.01 0.75 0.74 0.09 -1.04 0.05
## Y4_Congruent_AccStandard 0.71 0.98 0.27 -0.53 -1.29 0.02
## Y4_Incongruent_RTStandard 426.67 983.33 556.67 0.71 -0.26 36.34
## Y4_Neutral_RTStandard 361.00 856.43 495.43 0.86 0.06 31.35
## Y4_Congruent_RTStandard 415.50 824.75 409.25 0.01 -1.44 31.46
## Y4_Total_RTStandard 425.56 892.94 467.38 0.52 -0.77 33.54
## Y5_Total_AccStandard 0.22 0.96 0.74 0.12 -1.31 0.06
## Y5_Incongruent_AccStandard 0.02 1.00 0.98 -1.37 0.35 0.08
## Y5_Neutral_AccStandard 0.30 1.00 0.70 -0.78 -1.20 0.07
## Y5_Congruent_AccStandard 0.75 1.00 0.25 -2.06 2.40 0.02
## Y5_Incongruent_RTStandard 471.50 1016.50 545.00 1.42 2.19 33.44
## Y5_Neutral_RTStandard 371.29 836.29 465.00 1.35 1.06 30.34
## Y5_Congruent_RTStandard 399.25 910.25 511.00 1.01 0.57 31.86
## Y5_Total_RTStandard 439.69 974.62 534.94 1.41 1.90 32.40
## Y3_Total_AccStandard 0.08 0.92 0.83 0.11 -1.65 0.08
## Y3_Incongruent_AccStandard 0.09 0.98 0.88 -0.81 -0.22 0.06
## Y3_Neutral_AccStandard* 0.67 1.00 0.33 -1.03 -0.75 0.03
## Y3_Congruent_AccStandard 0.17 0.95 0.78 -0.35 -1.15 0.06
## Y3_Incongruent_RTStandard 548.19 871.26 323.07 -0.47 -1.26 25.69
## Y3_Neutral_RTStandard 543.48 899.48 356.00 -0.07 -1.45 29.83
## Y3_Congruent_RTStandard 504.44 889.12 384.68 0.23 -1.54 32.81
## Y3_Total_RTStandard 515.51 874.02 358.52 0.13 -1.55 31.91
## --------------------------------------------------------
## group: Sport
## vars n mean sd median trimmed mad
## ID* 1 15 NaN NA NA NaN NA
## Group* 2 15 2.00 0.00 2.00 2.00 0.00
## Age 3 15 81.40 3.11 82.00 81.31 2.97
## Gender* 4 15 1.73 0.46 2.00 1.77 0.00
## WASI 5 15 107.20 14.09 111.00 107.85 14.83
## Y4_Total_AccStandard 6 15 0.52 0.32 0.50 0.52 0.48
## Y4_Incongruent_AccStandard 7 15 0.54 0.29 0.67 0.54 0.30
## Y4_Neutral_AccStandard 8 15 0.52 0.32 0.66 0.53 0.23
## Y4_Congruent_AccStandard 9 15 0.81 0.07 0.80 0.81 0.10
## Y4_Incongruent_RTStandard 10 15 749.35 179.27 756.00 737.96 113.17
## Y4_Neutral_RTStandard 11 15 609.09 110.96 640.43 611.90 79.64
## Y4_Congruent_RTStandard 12 15 695.01 117.13 692.75 692.70 92.79
## Y4_Total_RTStandard 13 15 717.31 138.55 742.62 713.09 85.25
## Y5_Total_AccStandard 14 15 0.58 0.19 0.56 0.58 0.16
## Y5_Incongruent_AccStandard 15 15 0.52 0.28 0.52 0.52 0.28
## Y5_Neutral_AccStandard 16 15 0.73 0.35 0.89 0.76 0.16
## Y5_Congruent_AccStandard 17 15 0.64 0.21 0.70 0.64 0.14
## Y5_Incongruent_RTStandard 18 15 668.43 100.05 644.67 659.50 82.78
## Y5_Neutral_RTStandard 19 15 525.10 78.39 509.86 517.22 53.16
## Y5_Congruent_RTStandard 20 15 624.73 111.78 608.75 616.56 84.14
## Y5_Total_RTStandard 21 15 636.58 94.11 623.50 627.50 64.49
## Y3_Total_AccStandard 22 15 0.47 0.21 0.49 0.45 0.20
## Y3_Incongruent_AccStandard 23 15 0.46 0.27 0.45 0.45 0.29
## Y3_Neutral_AccStandard* 24 15 0.65 0.26 0.66 0.66 0.24
## Y3_Congruent_AccStandard 25 15 0.49 0.25 0.57 0.49 0.19
## Y3_Incongruent_RTStandard 26 15 740.88 91.40 739.97 739.37 99.22
## Y3_Neutral_RTStandard 27 15 771.80 109.39 795.94 779.54 99.85
## Y3_Congruent_RTStandard 28 15 689.54 139.12 757.00 688.63 158.24
## Y3_Total_RTStandard 29 15 660.70 122.91 623.46 655.80 140.95
## min max range skew kurtosis se
## ID* Inf -Inf -Inf NA NA NA
## Group* 2.00 2.00 0.00 NaN NaN 0.00
## Age 77.00 87.00 10.00 0.08 -1.53 0.80
## Gender* 1.00 2.00 1.00 -0.95 -1.16 0.12
## WASI 82.00 124.00 42.00 -0.59 -1.12 3.64
## Y4_Total_AccStandard 0.06 0.95 0.88 -0.09 -1.66 0.08
## Y4_Incongruent_AccStandard 0.12 0.96 0.84 -0.14 -1.71 0.08
## Y4_Neutral_AccStandard 0.01 0.96 0.95 -0.42 -1.54 0.08
## Y4_Congruent_AccStandard 0.71 0.90 0.19 0.01 -1.63 0.02
## Y4_Incongruent_RTStandard 423.50 1223.33 799.83 0.75 1.17 46.29
## Y4_Neutral_RTStandard 375.71 805.86 430.14 -0.37 -0.58 28.65
## Y4_Congruent_RTStandard 452.75 967.25 514.50 0.28 0.37 30.24
## Y4_Total_RTStandard 436.38 1053.12 616.75 0.30 0.60 35.77
## Y5_Total_AccStandard 0.26 0.89 0.63 0.10 -0.98 0.05
## Y5_Incongruent_AccStandard 0.02 0.97 0.95 0.03 -1.15 0.07
## Y5_Neutral_AccStandard 0.06 1.00 0.94 -0.82 -1.13 0.09
## Y5_Congruent_AccStandard 0.29 0.99 0.71 -0.21 -0.91 0.05
## Y5_Incongruent_RTStandard 562.33 890.67 328.33 0.90 -0.58 25.83
## Y5_Neutral_RTStandard 431.14 721.43 290.29 0.99 0.19 20.24
## Y5_Congruent_RTStandard 460.75 895.00 434.25 0.67 0.06 28.86
## Y5_Total_RTStandard 517.75 873.38 355.62 1.00 0.27 24.30
## Y3_Total_AccStandard 0.13 0.98 0.84 0.47 -0.15 0.06
## Y3_Incongruent_AccStandard 0.08 0.91 0.84 0.24 -1.29 0.07
## Y3_Neutral_AccStandard* 0.09 0.98 0.88 -0.75 -0.45 0.07
## Y3_Congruent_AccStandard 0.02 0.98 0.97 -0.22 -0.71 0.07
## Y3_Incongruent_RTStandard 602.82 898.65 295.83 0.08 -1.28 23.60
## Y3_Neutral_RTStandard 543.41 899.53 356.11 -0.65 -0.86 28.24
## Y3_Congruent_RTStandard 500.02 890.86 390.84 -0.11 -1.71 35.92
## Y3_Total_RTStandard 508.60 876.45 367.85 0.45 -1.22 31.73
## --------------------------------------------------------
## group: Control
## vars n mean sd median trimmed mad
## ID* 1 16 NaN NA NA NaN NA
## Group* 2 16 3.00 0.00 3.00 3.00 0.00
## Age 3 16 82.56 2.76 82.00 82.50 2.97
## Gender* 4 16 1.56 0.51 2.00 1.57 0.00
## WASI 5 16 96.06 16.65 98.00 97.07 14.08
## Y4_Total_AccStandard 6 16 0.55 0.31 0.62 0.56 0.26
## Y4_Incongruent_AccStandard 7 16 0.45 0.28 0.40 0.44 0.28
## Y4_Neutral_AccStandard 8 16 0.33 0.23 0.27 0.32 0.21
## Y4_Congruent_AccStandard 9 16 0.89 0.08 0.88 0.89 0.10
## Y4_Incongruent_RTStandard 10 16 639.45 132.49 652.83 621.52 99.46
## Y4_Neutral_RTStandard 11 16 545.79 109.17 547.43 536.42 113.10
## Y4_Congruent_RTStandard 12 16 599.56 87.33 582.62 598.89 78.58
## Y4_Total_RTStandard 13 16 628.47 115.79 631.00 617.48 103.74
## Y5_Total_AccStandard 14 16 0.41 0.21 0.41 0.42 0.26
## Y5_Incongruent_AccStandard 15 16 0.63 0.16 0.64 0.62 0.11
## Y5_Neutral_AccStandard 16 16 0.99 0.04 1.00 1.00 0.00
## Y5_Congruent_AccStandard 17 16 0.44 0.24 0.46 0.44 0.27
## Y5_Incongruent_RTStandard 18 16 565.72 80.83 539.33 557.04 53.13
## Y5_Neutral_RTStandard 19 16 470.08 91.86 453.93 457.33 49.03
## Y5_Congruent_RTStandard 20 16 533.20 64.52 520.12 527.93 62.08
## Y5_Total_RTStandard 21 16 551.11 75.78 534.94 543.52 53.56
## Y3_Total_AccStandard 22 16 0.62 0.24 0.64 0.63 0.32
## Y3_Incongruent_AccStandard 23 16 0.45 0.30 0.47 0.44 0.29
## Y3_Neutral_AccStandard* 24 16 0.40 0.28 0.41 0.39 0.36
## Y3_Congruent_AccStandard 25 16 0.51 0.31 0.47 0.51 0.37
## Y3_Incongruent_RTStandard 26 16 724.16 99.53 723.89 727.07 111.46
## Y3_Neutral_RTStandard 27 16 693.05 119.40 666.95 691.52 120.32
## Y3_Congruent_RTStandard 28 16 667.32 105.34 657.52 667.11 132.64
## Y3_Total_RTStandard 29 16 683.15 105.32 701.66 679.91 139.51
## min max range skew kurtosis se
## ID* Inf -Inf -Inf NA NA NA
## Group* 3.00 3.00 0.00 NaN NaN 0.00
## Age 78.00 88.00 10.00 0.20 -1.03 0.69
## Gender* 1.00 2.00 1.00 -0.23 -2.07 0.13
## WASI 56.00 122.00 66.00 -0.56 0.04 4.16
## Y4_Total_AccStandard 0.00 0.92 0.92 -0.59 -1.16 0.08
## Y4_Incongruent_AccStandard 0.07 0.94 0.87 0.40 -1.29 0.07
## Y4_Neutral_AccStandard 0.03 0.78 0.75 0.46 -1.09 0.06
## Y4_Congruent_AccStandard 0.74 1.00 0.26 -0.24 -1.25 0.02
## Y4_Incongruent_RTStandard 487.50 1042.33 554.83 1.52 2.62 33.12
## Y4_Neutral_RTStandard 392.29 830.57 438.29 0.89 0.46 27.29
## Y4_Congruent_RTStandard 456.75 751.75 295.00 0.18 -1.15 21.83
## Y4_Total_RTStandard 468.62 942.19 473.56 0.98 0.85 28.95
## Y5_Total_AccStandard 0.01 0.71 0.70 -0.13 -1.17 0.05
## Y5_Incongruent_AccStandard 0.33 0.98 0.65 -0.01 -0.26 0.04
## Y5_Neutral_AccStandard 0.83 1.00 0.17 -3.28 9.36 0.01
## Y5_Congruent_AccStandard 0.01 0.81 0.81 -0.21 -1.21 0.06
## Y5_Incongruent_RTStandard 465.83 787.17 321.33 1.28 1.07 20.21
## Y5_Neutral_RTStandard 384.71 734.00 349.29 1.67 2.03 22.96
## Y5_Congruent_RTStandard 463.25 677.00 213.75 0.80 -0.46 16.13
## Y5_Total_RTStandard 466.62 741.81 275.19 1.21 0.54 18.95
## Y3_Total_AccStandard 0.23 0.94 0.71 -0.10 -1.44 0.06
## Y3_Incongruent_AccStandard 0.03 0.99 0.96 0.22 -1.02 0.08
## Y3_Neutral_AccStandard* 0.02 0.91 0.89 0.33 -1.06 0.07
## Y3_Congruent_AccStandard 0.05 1.00 0.95 0.18 -1.39 0.08
## Y3_Incongruent_RTStandard 548.36 859.27 310.91 -0.16 -1.40 24.88
## Y3_Neutral_RTStandard 510.27 897.27 387.00 0.34 -1.18 29.85
## Y3_Congruent_RTStandard 513.41 824.12 310.71 0.03 -1.64 26.34
## Y3_Total_RTStandard 555.65 856.01 300.36 0.18 -1.56 26.33
DISCLAIMER: Basically all of basic statistics relies on a set of assumptions, namely that your data is normally distributed and has homogeniety of variance. It would be really lovely if we could just test these assumptions, make sure the tests don’t fail, and then move on to using our classic tests. However, that is not how this works. The shapiro test is good at telling you when your data is NOT normal (you can trust that one), but just because it doesn’t fail doesn’t mean that you can accept that it is normal. Basically, even if you test for things, you never really know. So, please use classic methods with caution.
Normality and Skewness
## [1] 0.06103718
What is this??
Skewness is a measure of symmetry, basically. Negative skewness = mean < median (left skew) Positive skewness = mean > median (right skew)
When you should worry: >1.96 or < -1.96
A perfectly non-skewed data is where the mean = median, skewness = 0.
Kurtosis
## [1] -0.9274062
What is this??
Kurtosis is similar to skewness, in that it measures the data’s distriution. It measures the tail shape of the data distribution/ how “peaked” the distribution is.
A thin-tailed data distribution has too much data in the middle. A fat-tailed data distribution has too much data on the tails.
A perfect normal distribution has 0 excess kurtosis.
The acceptable range for this is similar to skewness, above.
Test for normality
##
## Shapiro-Wilk normality test
##
## data: flanker$Y4_Total_AccStandard
## W = 0.8859, p-value = 0.0001698
if it fails (nonnormal), p is less than 0.05.
How to report
“Age was normally distributed with skewness of 0.06 and kurtosis of -0.93.”
Categorical data is data that is not continuous. These include variables like Gender, or things that have a yes or no answer. We can look at this data in many ways, but here is one. This test will tell us if there are gender differences between groups.
##
## Pearson's Chi-squared test
##
## data: flanker_long$Group and flanker_long$Gender
## X-squared = 36.273, df = 2, p-value = 1.328e-08
What is this?? This tests if 2 categorical variables are related in some population.
If the p value is less than p < 0.05, we reject the null hypothesis that the distribution of the variables are independent. Basically, the groups differ in gender distribution.
How to report
“An association between group and gender was observed, χ2(2) = 36.27, p < 0.01.”
##
## Welch Two Sample t-test
##
## data: music$Y4_Total_AccStandard and control$Y4_Total_AccStandard
## t = 0.17446, df = 32.984, p-value = 0.8626
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2121033 0.2518894
## sample estimates:
## mean of x mean of y
## 0.5714368 0.5515437
This is the classic non-parametric alternative to the t test. It is HORRIBLE if you have tied values. So watch out for that.
##
## Wilcoxon rank sum test
##
## data: music$Y4_Total_AccStandard and control$Y4_Total_AccStandard
## W = 171, p-value = 0.5448
## alternative hypothesis: true location shift is not equal to 0
This is a robust alternative to the t test and it performs well even with tied values. It tests three different hypotheses (equal, greater, less than).
## $n1
## [1] 19
##
## $n2
## [1] 16
##
## $cl
## [1] -0.260875
##
## $cu
## [1] 0.4764344
##
## $d
## [1] 0.125
##
## $sqse.d
## [1] 0.04094187
##
## $phat
## [1] 0.4375
##
## $summary.dvals
## P(X<Y) P(X=Y) P(X>Y)
## [1,] 0.4375 0 0.5625
##
## $ci.p
## [1] 0.2617828 0.6304375
This plot shows the distribution of D, wehre D is the difference between a randomly sampled observation from group1-randomly sampled observtion from group2. D will be symmetric around 0 when distributions are identical.
## $n1
## [1] 19
##
## $n2
## [1] 16
##
## $d.hat
## [1] 0.125
##
## $d.ci
## [1] -0.2608750 0.4764344
##
## $p.value
## [1] 0.54
##
## $p.hat
## [1] 0.4375
##
## $p.ci
## [1] 0.2617828 0.6304375
##
## $summary.dvals
## P(X<Y) P(X=Y) P(X>Y)
## [1,] 0.4375 0 0.5625
This works for Wide
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 0.010 0.00487 0.043 0.958
## Residuals 44 4.944 0.11236
But also for long
## Df Sum Sq Mean Sq F value Pr(>F)
## Year 2 2.10 1.0489 11.85 9.83e-06 ***
## Residuals 420 37.17 0.0885
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
But lets say you wanted to look at 2 variables. You might want to do a 2way anova.
## Warning in aov(AccStandard ~ Condition * Group + Error(ID/(Condition *
## Group)), : Error() model is singular
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 1.821 0.9106 10.96 0.000138 ***
## Residuals 44 3.657 0.0831
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID:Condition
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 1.423 0.7115 15.113 2.28e-06 ***
## Condition:Group 4 0.238 0.0594 1.262 0.291
## Residuals 88 4.143 0.0471
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 282 27.98 0.09924
If, for example, you had group differences in Age, you might want to add it in as a co-variate.
Just change that * to a +
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 1.42 0.7115 7.885 0.000435 ***
## Age 1 0.04 0.0367 0.407 0.524081
## Residuals 419 37.81 0.0902
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
see page 441 of Rand’s book. (11.6.3) please read to understand how boostrapping works. This is probably the most robust way to do a between/within analysis. You can also just do a single within or single between.
Data setup
#isolate your variable into a single dataset
flanker_avonly = flanker_long[,c(1,2,6,8)]
#put in list mode
z= fac2list(flanker_avonly[,4], flanker_avonly[,c(2,3)])## [1] "Group Levels:"
## [,1] [,2]
## [1,] 1 1
## [2,] 1 2
## [3,] 1 3
## [4,] 2 1
## [5,] 2 2
## [6,] 2 3
## [7,] 3 1
## [8,] 3 2
## [9,] 3 3
## Error in formula.default(object, env = baseenv()): invalid formula
## Error in formula.default(object, env = baseenv()): invalid formula
## Error in formula.default(object, env = baseenv()): invalid formula
When reporting statistics in papers, you need to include eta squared, or effect size. These values do not immediately appear with this call. So, you have to do this:
## eta.sq eta.sq.part
## Year 0.05342401 0.05838327
## Group 0.04638160 0.05108024
## Year:Group 0.03856145 0.04283681
Because, especially older papers, sometimes dont report it! Here is how to calculate it.
kind of like cohen’s d, but reduces bias
esc_mean_sd(grp1m = 41.98, #mean of the intervention group
grp1sd = 2.47, #SD of the intervention group
grp1n = 18, #n of the intervention group
grp2m = 43.17, #mean of other group
grp2sd = 3.69, #SD of other group
grp2n = 15, #n of the other group
es.type = "g")#effect measure, can put "d" for cohens d##
## Effect Size Calculation for Meta Analysis
##
## Conversion: mean and sd to effect size Hedges' g
## Effect Size: -0.3768
## Standard Error: 0.3528
## Variance: 0.1245
## Lower CI: -1.0683
## Upper CI: 0.3147
## Weight: 8.0333
esc_f(f=5.13, #F value from the anova
grp1n = 13, #number of participants in group 1
grp2n = 13, #number of participants in group 2
es.type = "g") ##
## Effect Size Calculation for Meta Analysis
##
## Conversion: F-value (one-way-Anova) to effect size Hedges' g
## Effect Size: 0.8603
## Standard Error: 0.4111
## Variance: 0.1690
## Lower CI: 0.0545
## Upper CI: 1.6661
## Weight: 5.9163
esc_mean_se(grp1m = 8.5,
grp1se = 1.5,
grp1n = 50,
grp2m = 11,
grp2se = 1.8,
grp2n = 60,
es.type = "g")##
## Effect Size Calculation for Meta Analysis
##
## Conversion: mean and se to effect size Hedges' g
## Effect Size: -0.1998
## Standard Error: 0.1920
## Variance: 0.0369
## Lower CI: -0.5760
## Upper CI: 0.1765
## Weight: 27.1366
Unstandardarized
esc_B(b=3.3, #unstandardized coef. b (treatment predictor)
sdy=5, #sd of the dependent variable y
grp1n = 100, #n of first group
grp2n = 150, # n of second group
es.type = "g")##
## Effect Size Calculation for Meta Analysis
##
## Conversion: unstandardized regression coefficient to effect size Hedges' g
## Effect Size: 0.6941
## Standard Error: 0.1328
## Variance: 0.0176
## Lower CI: 0.4338
## Upper CI: 0.9544
## Weight: 56.7018
Standardized
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: standardized regression coefficient to effect size Hedges' g
## Effect Size: 1.9868
## Standard Error: 0.1569
## Variance: 0.0246
## Lower CI: 1.6793
## Upper CI: 2.2942
## Weight: 40.6353
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: chi-squared-value to effect size Cox odds ratios
## Effect Size: 2.9858
## Standard Error: 0.3478
## Variance: 0.1210
## Lower CI: 1.5101
## Upper CI: 5.9036
## Weight: 8.2667
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: point-biserial r to effect size Hedges' g
## Effect Size: 0.5170
## Standard Error: 0.1380
## Variance: 0.0190
## Lower CI: 0.2465
## Upper CI: 0.7875
## Weight: 52.4967
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: t-value to effect size Hedges' g
## Effect Size: 0.3806
## Standard Error: 0.3516
## Variance: 0.1236
## Lower CI: -0.3085
## Upper CI: 1.0697
## Weight: 8.0887
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: unstandardized regression coefficient to effect size Hedges' g
## Effect Size: 0.6941
## Standard Error: 0.1328
## Variance: 0.0176
## Lower CI: 0.4338
## Upper CI: 0.9544
## Weight: 56.7018
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: standardized regression coefficient to effect size Hedges' g
## Effect Size: -0.5773
## Standard Error: 0.1606
## Variance: 0.0258
## Lower CI: -0.8920
## Upper CI: -0.2626
## Weight: 38.7879
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = AccStandard ~ Year, data = flanker_long)
##
## $Year
## diff lwr upr p adj
## Y4-Y3 0.008040865 -0.07529259 0.09137432 0.9720074
## Y5-Y3 0.153246940 0.06991348 0.23658040 0.0000565
## Y5-Y4 0.145206075 0.06187262 0.22853953 0.0001475
What is all of this??
We use ANOVA to determine if there are any statistically significant differences between groups on a given dependent variable.
The null hypothesis states that the groups are all the same, and that the means are not statistically different from each other.
If a significant difference is found, it is called a main effect, and must then be interpreted using a post-hoc test. It spits out something called an F ratio, which is F = variation BETWEEN samples/ variation WITHIN samples. The null hypothesis says that the F ratio is 1.
So, if it is greater than 1, it means that the variation BETWEEN your sample is greater than the variation WITHIN your sample. This means that there is something acting upon your data that makes a change in it that is OUTSIDE of the variation the comes from within individuals.
Post hocs let you know where that main effect is coming from. They are a posteriori tests, which mean that they are performed after the ANOVA. Performing them before the ANOVA will increase your chances of obtaining a Type 1 error (false positive), and is generally considered poor practice.
However, there is also an argument that states that using any kind of post hoc AFTER doing your global test(like an f test) completely changes the properties of the post hoc… and you’re not really testing what you think you’re testing. So, things to think about.
How to report
“A 1-Way ANOVA revealed a main effect of Year on accuracy, F(2,420) = 11.85, p < 0.001,η2 = 0.05. Post-hoc analysis using Tukey’s HSD indicated that participants were more accurate Year 5 than Year 3 (p < 0.001), and Year 4 (p < 0.001).”
Looking across years, it is helpful to do a repeated measures ANOVA
flanker_anova = ezANOVA(data = flanker_long,
dv = AccStandard,
wid = ID,
within = c(Year,Condition),
between = Group,
detailed = F)It will likely tell you your data is unbalanced and that is just how it is!!! who can have perfect retention, really?
Now, see the results.
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Group 2 44 10.955254 1.377071e-04 * 0.07218744
## 3 Year 2 88 16.487581 8.290947e-07 * 0.08224664
## 5 Condition 2 88 15.113204 2.279290e-06 * 0.05730631
## 4 Group:Year 4 88 5.950368 2.761895e-04 * 0.06075578
## 6 Group:Condition 4 88 1.262064 2.910140e-01 0.01005078
## 7 Year:Condition 4 176 24.916370 2.331043e-16 * 0.19494460
## 8 Group:Year:Condition 8 176 6.805202 8.963359e-08 * 0.11682089
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 3 Year 0.9883816 0.77782208
## 4 Group:Year 0.9883816 0.77782208
## 5 Condition 0.9909077 0.82170173
## 6 Group:Condition 0.9909077 0.82170173
## 7 Year:Condition 0.6274167 0.01946062 *
## 8 Group:Year:Condition 0.6274167 0.01946062 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe
## 3 Year 0.9885151 9.393954e-07 * 1.0347479
## 4 Group:Year 0.9885151 2.951186e-04 * 1.0347479
## 5 Condition 0.9909897 2.492466e-06 * 1.0375199
## 6 Group:Condition 0.9909897 2.911876e-01 1.0375199
## 7 Year:Condition 0.8507300 2.937179e-14 * 0.9306791
## 8 Group:Year:Condition 0.8507300 6.724802e-07 * 0.9306791
## p[HF] p[HF]<.05
## 3 8.290947e-07 *
## 4 2.761895e-04 *
## 5 2.279290e-06 *
## 6 2.910140e-01
## 7 2.200106e-15 *
## 8 2.282444e-07 *
What does this mean??
Like a 1-way ANOVA, a RM ANOVA looks at related but not indpedendent groups. It allows us to look at groups over time.
Here, you can look at an an interaction effect as well as any main effects. Interaction effects show how two things are related as a function of some other thing!
How to report
"A repeated-measures ANOVA revealed a significant main effect of ___(see above). A significant Year*Group interaction effect was also observed, F(4,88) = 5.95, p< 0.001,η2 = 0.06. Follow-up analyses indicated that the music group had significantly higher accuracy than the control group, only at Year 5 (p< 0.05)."
Look! above, it shows you the sphericity assumptions AND the effect sizes. How great is that.
What are sphericity assumptions? Sphericity is the assumption of a RM ANOVA, where the variances of the differences between all the possible pairs of a within subjects condition are equal. For example, the variances of Y3 and Y4, and Incongruent and Congruent are equal. When violated, you can get an F-ratio that is inflated.
Mauchly’s test of Sphericity tells you whether or not the assumptions are violated, by giving a p value. When the probability of Mauchly’s test is < 0.05, the assumption has been violated. Therefore, we can conclude that there are *significant differences between the variances of differences.
This test is required when you have more than 2 levels of a given within subjects condition. Here, we have 3 years and 3 conditions, so we definitely should look at it.
GGe = greenhouse-geisser epsilon (what you report below) pGG = Adjusted p value
you also need the corrected degrees of freedom. Calculate that by:
df(n) x GGe, df(d) x GGe
dfn_corrected = (flanker_anova$ANOVA$DFn[5])*(flanker_anova$`Sphericity Corrections`$GGe[5])
#[5] is the row that you are specifying
print(dfn_corrected)## [1] 3.40292
dfe_corrected = (flanker_anova$ANOVA$DFd[5])*(flanker_anova$`Sphericity Corrections`$GGe[5])
print(dfe_corrected)## [1] 74.86424
When it is violated, you need to correct the degrees of freedom.
How to report
“Mauchly’s test indicated that the assumption of sphericity had been violated for the Year by Condition interaction, therefore degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.85), F(3.4,74.86) = 6.81, p < 0.001.”
People sometimes report the chi-squared but, tbh I dont know where that comes from and I’ve rarely seen it. Sometimes people don’t even put anything, they just say “Sphericity corrections using G-G were used when appropriate”, and its assumed that if you have a weird-ass DF, that is the reason why.
What do the main effects mean??
output = lme(AccStandard ~ Year,
random = ~1|ID,
data =flanker_long,
method = "ML")
tukey = glht(output, linfct = mcp(Year = "Tukey"))
summary(tukey)##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: lme.formula(fixed = AccStandard ~ Year, data = flanker_long,
## random = ~1 | ID, method = "ML")
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## Y4 - Y3 == 0 0.008041 0.034576 0.233 0.971
## Y5 - Y3 == 0 0.153247 0.034576 4.432 <1e-04 ***
## Y5 - Y4 == 0 0.145206 0.034576 4.200 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Ok, but what do all of those interaction effects mean? Let us explore with a simple effects test. Let’s just pretend, first that we had a YEAR BY GROUP interaction.
First, split up into separate data sets.
Y3 = subset(flanker_long, Year == "Y3")
Y4 = subset(flanker_long, Year == "Y4")
Y5 = subset(flanker_long, Year == "Y5")Then, run ANOVAs on group within each year
Y5Out = aov(AccStandard ~ Group, data = Y5)
Y4Out = aov(AccStandard ~ Group, data = Y4)
Y3Out = aov(AccStandard ~ Group, data = Y3)
summary(Y5Out)## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 1.295 0.6477 8.352 0.000377 ***
## Residuals 138 10.702 0.0775
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 0.123 0.06171 0.662 0.517
## Residuals 138 12.865 0.09322
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 1.917 0.9584 12.88 7.43e-06 ***
## Residuals 138 10.267 0.0744
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Do a post of to find any sig differences between groups in each year
##
## Study: Y5Out ~ "Group"
##
## HSD Test for AccStandard
##
## Mean Square Error: 0.07754781
##
## Group, means
##
## AccStandard std r Min Max
## Control 0.6854810 0.2850832 48 0.0088 1
## Music 0.8555076 0.2571079 48 0.0162 1
## Sport 0.6303778 0.2929041 45 0.0244 1
##
## Alpha: 0.05 ; DF Error: 138
## Critical Value of Studentized Range: 3.350657
##
## Comparison between treatments means
##
## difference pvalue signif. LCL UCL
## Control - Music -0.17002660 0.0092 ** -0.30470381 -0.03534939
## Control - Sport 0.05510326 0.6074 -0.08180017 0.19200670
## Music - Sport 0.22512986 0.0004 *** 0.08822643 0.36203329
##
## Study: Y4Out ~ "Group"
##
## HSD Test for AccStandard
##
## Mean Square Error: 0.09322311
##
## Group, means
##
## AccStandard std r Min Max
## Control 0.5568437 0.3226933 48 0.0309 0.9955
## Music 0.5639854 0.3105934 48 0.0131 0.9787
## Sport 0.6235689 0.2794730 45 0.0091 0.9637
##
## Alpha: 0.05 ; DF Error: 138
## Critical Value of Studentized Range: 3.350657
##
## Comparison between treatments means
##
## difference pvalue signif. LCL UCL
## Control - Music -0.007141667 0.9928 -0.1548045 0.14052115
## Control - Sport -0.066725139 0.5447 -0.2168288 0.08337856
## Music - Sport -0.059583472 0.6157 -0.2096872 0.09052022
##
## Study: Y3Out ~ "Group"
##
## HSD Test for AccStandard
##
## Mean Square Error: 0.07440126
##
## Group, means
##
## AccStandard std r Min Max
## Control 0.4537827 0.2955247 48 0.01727143 0.9988
## Music 0.7285660 0.2513071 48 0.09480000 1.0000
## Sport 0.5327533 0.2694414 45 0.01520000 0.9807
##
## Alpha: 0.05 ; DF Error: 138
## Critical Value of Studentized Range: 3.350657
##
## Comparison between treatments means
##
## difference pvalue signif. LCL UCL
## Control - Music -0.2747832 0.0000 *** -0.40669985 -0.14286662
## Control - Sport -0.0789706 0.3462 -0.21306780 0.05512661
## Music - Sport 0.1958126 0.0021 ** 0.06171543 0.32990984
Ok, now let’s actually look at our pretend YEAR * GROUP * CONDITION interaction
So keep those subsets.
Run a RM anova for group * condition within each year
flanker_anova3 = ezANOVA(data = Y3,
dv = AccStandard,
wid = ID,
within = Condition,
between = Group,
detailed = F)## Warning: Converting "ID" to factor for ANOVA.
## Warning: You have removed one or more levels from variable "Condition".
## Refactoring for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Group 2 44 11.630578 8.813404e-05 * 0.17560790
## 3 Condition 2 88 3.890085 2.404979e-02 * 0.05014061
## 4 Group:Condition 4 88 3.253038 1.544515e-02 * 0.08112364
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 3 Condition 0.9632839 0.4474218
## 4 Group:Condition 0.9632839 0.4474218
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 3 Condition 0.9645842 0.02550448 * 1.007974 0.02404979
## 4 Group:Condition 0.9645842 0.01671684 * 1.007974 0.01544515
## p[HF]<.05
## 3 *
## 4 *
flanker_anova4 = ezANOVA(data = Y4,
dv = AccStandard,
wid = ID,
within = Condition,
between = Group,
detailed = F)## Warning: Converting "ID" to factor for ANOVA.
## Warning: You have removed one or more levels from variable "Condition".
## Refactoring for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Group 2 44 1.099999 3.418501e-01 0.01763845
## 3 Condition 2 88 56.175094 1.898840e-16 * 0.45001672
## 4 Group:Condition 4 88 1.828751 1.304065e-01 0.05057993
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 3 Condition 0.7328087 0.00125112 *
## 4 Group:Condition 0.7328087 0.00125112 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 3 Condition 0.7891469 1.769696e-13 * 0.8135366 8.017251e-14
## 4 Group:Condition 0.7891469 1.471664e-01 0.8135366 1.451299e-01
## p[HF]<.05
## 3 *
## 4
flanker_anova5 = ezANOVA(data = Y5,
dv = AccStandard,
wid = ID,
within = Condition,
between = Group,
detailed = F)## Warning: Converting "ID" to factor for ANOVA.
## Warning: You have removed one or more levels from variable "Condition".
## Refactoring for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Group 2 44 9.013245 5.239145e-04 * 0.1466672
## 3 Condition 2 88 9.975428 1.245345e-04 * 0.1162974
## 4 Group:Condition 4 88 10.929622 3.062081e-07 * 0.2238323
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 3 Condition 0.9902836 0.8106449
## 4 Group:Condition 0.9902836 0.8106449
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 3 Condition 0.9903771 1.322129e-04 * 1.036834 1.245345e-04
## 4 Group:Condition 0.9903771 3.441497e-07 * 1.036834 3.062081e-07
## p[HF]<.05
## 3 *
## 4 *
So it looks like it’s not a group * condition effect, based on year. Lets split it up based on condition.
Con = subset(flanker_long, Condition == "Congruent")
Inc = subset(flanker_long, Year == "Incongruent")
Neut = subset(flanker_long, Year == "Neutral")Now run the RM Anova for GROUP* YEAR within each condition
con_anova = ezANOVA(data = Con,
dv = AccStandard,
wid = ID,
within = Year,
between = Group,
detailed = F)## Warning: Converting "ID" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Group 2 44 13.002999 3.655807e-05 * 0.1898688
## 3 Year 2 88 33.719352 1.344728e-11 * 0.3162242
## 4 Group:Year 4 88 9.305931 2.556283e-06 * 0.2033557
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 3 Year 0.8723394 0.05305636
## 4 Group:Year 0.8723394 0.05305636
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 3 Year 0.8867916 1.530853e-10 * 0.9213574 7.281456e-11
## 4 Group:Year 0.8867916 8.059822e-06 * 0.9213574 5.673412e-06
## p[HF]<.05
## 3 *
## 4 *
and you continue until you figure it out!
We did this above, but here it is again. Y5out is the name of an AOV
##
## Study: Y5Out ~ "Group"
##
## HSD Test for AccStandard
##
## Mean Square Error: 0.07754781
##
## Group, means
##
## AccStandard std r Min Max
## Control 0.6854810 0.2850832 48 0.0088 1
## Music 0.8555076 0.2571079 48 0.0162 1
## Sport 0.6303778 0.2929041 45 0.0244 1
##
## Alpha: 0.05 ; DF Error: 138
## Critical Value of Studentized Range: 3.350657
##
## Comparison between treatments means
##
## difference pvalue signif. LCL UCL
## Control - Music -0.17002660 0.0092 ** -0.30470381 -0.03534939
## Control - Sport 0.05510326 0.6074 -0.08180017 0.19200670
## Music - Sport 0.22512986 0.0004 *** 0.08822643 0.36203329
You can also do a simple t.test
##
## Welch Two Sample t-test
##
## data: Y3$AccStandard and Y4$AccStandard
## t = -0.22517, df = 279.71, p-value = 0.822
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.07833500 0.06225327
## sample estimates:
## mean of x mean of y
## 0.5725293 0.5805702
This way of looking at the data is good because it will let you look at people who don’t have data for all years. It requires a very specific format of data (kind of like our easy long form transform thingy above)
We are going to use a NEW data set. It has LOTS of missing datapoints, and isn’t in the best format, so you can see how we fix that!
Let’s check missing data rates.
## subject_id group age_1 age_2
## Length:66 Min. :1.000 Min. :77.00 Min. : 88.00
## Class :character 1st Qu.:1.000 1st Qu.:79.00 1st Qu.: 92.00
## Mode :character Median :2.000 Median :82.00 Median : 94.00
## Mean :2.046 Mean :82.43 Mean : 94.34
## 3rd Qu.:3.000 3rd Qu.:85.00 3rd Qu.: 96.00
## Max. :3.000 Max. :88.00 Max. :101.00
## NA's :1 NA's :1 NA's :1
## age_3 age_4 age_5 backwards_1
## Min. : 99.0 Min. :110.0 Min. :121.0 Min. :1.000
## 1st Qu.:104.0 1st Qu.:116.0 1st Qu.:128.0 1st Qu.:2.000
## Median :107.0 Median :119.0 Median :131.0 Median :3.000
## Mean :106.3 Mean :118.4 Mean :130.6 Mean :3.492
## 3rd Qu.:109.0 3rd Qu.:121.0 3rd Qu.:133.0 3rd Qu.:5.000
## Max. :113.0 Max. :126.0 Max. :138.0 Max. :6.000
## NA's :1 NA's :1 NA's :1 NA's :1
## backwards_2 backwards_3 backwards_4 backwards_5
## Min. :2.000 Min. :3.00 Min. :3.000 Min. :3.000
## 1st Qu.:3.000 1st Qu.:3.00 1st Qu.:3.000 1st Qu.:4.000
## Median :4.000 Median :4.00 Median :5.000 Median :5.000
## Mean :4.031 Mean :4.46 Mean :4.508 Mean :4.627
## 3rd Qu.:5.000 3rd Qu.:6.00 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :6.000 Max. :6.00 Max. :6.000 Max. :6.000
## NA's :1 NA's :3 NA's :5 NA's :7
## forward_1 forward_2 forward_3 forward_4
## Min. : 4.000 Min. : 5.00 Min. : 5.0 Min. : 5.000
## 1st Qu.: 5.000 1st Qu.: 6.00 1st Qu.: 7.0 1st Qu.: 7.000
## Median : 7.000 Median : 8.00 Median : 9.0 Median :10.000
## Mean : 7.033 Mean : 7.80 Mean : 9.1 Mean : 9.441
## 3rd Qu.: 9.000 3rd Qu.: 9.25 3rd Qu.:11.0 3rd Qu.:12.000
## Max. :10.000 Max. :10.00 Max. :13.0 Max. :13.000
## NA's :6 NA's :6 NA's :6 NA's :7
## forward_5
## Min. : 5.000
## 1st Qu.: 7.000
## Median : 9.000
## Mean : 8.966
## 3rd Qu.:11.000
## Max. :13.000
## NA's :8
Put it in longform, using our fancy method.
digit_span_long <- digit_span %>%
gather(key = var, value = val, -(subject_id:group)) %>%
separate(var, c("var", "time")) %>%
spread(key = var, value = val) %>%
mutate(time = as.integer(time),
age_in_yr = age / 12, # age in years
age_in_yr5 = age_in_yr - 5, # age in years centered at 5 years old
cond = recode_factor(group,
`3` = "control",
`1` = "music",
`2` = "sport"),
time0 = time - 1) Here, we added 2 extra variables to help with interpretation of our model coefficients.
cond -> we reordered our groups such that control was the first group, followed by music and sport. Why? When we do our models, this reordering will make the intercept (\(\beta_0\)) correspond to the control group. Since group is a factor with 3 levels, when doing linear models, you give each factor level a dummy variable. By doing this, the level with no dummy variable is control and is now our baseline. In this way, the coefficients (\(\beta_1\)) associated with sport and music are the changes in your variable of interest from control. See below for examples.
time0 -> We subtracted 1 from our time variable (the year of the study, eg. Year 1 = 1, Year 2 = 2), so that Year 1 would now be year 0. We did this so that the intercept would be the mean at time 0.
By changing these 2 variables, we we make it easier to interpret the results of the linear growth models. These changes make the intercept coefficient reflect the estimated outcome score (mean) for the contol group at Year 1. Now our intercept is more meaningful and the results are easier to interpret.
digit_span_long %>%
select(time, age, backwards, forward, cond) %>%
ggpairs(lower = list(continuous = wrap("points", alpha = 0.5, size = 0.2))) ## Models
Forwards, by time
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: forward ~ cond * time0 + (time0 | subject_id)
## Data: digit_span_long
##
## REML criterion at convergence: 1348.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.81212 -0.85073 0.03932 0.74603 2.30620
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject_id (Intercept) 0.1567 0.3959
## time0 0.2115 0.4599 -1.00
## Residual 4.9054 2.2148
## Number of obs: 297, groups: subject_id, 62
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 7.36955 0.38582 151.21853 19.101 < 2e-16 ***
## condmusic -0.04576 0.55924 150.92087 -0.082 0.93489
## condsport 0.01388 0.55030 155.36021 0.025 0.97991
## time0 0.51847 0.18658 68.46488 2.779 0.00704 **
## condmusic:time0 0.01037 0.26926 67.55687 0.039 0.96938
## condsport:time0 0.10759 0.26209 69.42060 0.410 0.68271
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndmsc cndspr time0 cndm:0
## condmusic -0.690
## condsport -0.701 0.484
## time0 -0.787 0.543 0.552
## condmsc:tm0 0.545 -0.788 -0.382 -0.693
## cndsprt:tm0 0.560 -0.386 -0.788 -0.712 0.493
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Intercept is the average forwards score for the control group at Year 1. (see above for explanation on how we made it so the intercept would be interpreted this way).
At Year 1, the difference between the control mean and the music mean is -0.05 from the intercept (7.37). so the music mean at year 1 is round(control_coeff, 2) + -0.05 = 7.324
Forwards, by age
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: forward ~ cond * age_in_yr5 + (age_in_yr5 | subject_id)
## Data: digit_span_long
##
## REML criterion at convergence: 1349.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.86334 -0.83188 0.01543 0.73994 2.32325
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject_id (Intercept) 1.4645 1.2102
## age_in_yr5 0.1921 0.4382 -1.00
## Residual 4.9671 2.2287
## Number of obs: 297, groups: subject_id, 62
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 6.3977530 0.6792200 88.9590207 9.419 5.11e-15 ***
## condmusic -0.0551157 0.9906320 88.9388735 -0.056 0.95576
## condsport 0.0046551 0.9638540 91.7058521 0.005 0.99616
## age_in_yr5 0.5173504 0.1812443 66.4550439 2.854 0.00574 **
## condmusic:age_in_yr5 -0.0005214 0.2631795 66.6541075 -0.002 0.99843
## condsport:age_in_yr5 0.0637871 0.2553870 67.7088557 0.250 0.80352
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndmsc cndspr ag_n_5 cndm:__5
## condmusic -0.686
## condsport -0.705 0.483
## age_in_yr5 -0.936 0.642 0.660
## cndmsc:g__5 0.645 -0.938 -0.454 -0.689
## cndsprt:__5 0.665 -0.456 -0.936 -0.710 0.489
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Backwards, by time
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: backwards ~ cond * time0 + (time0 | subject_id)
## Data: digit_span_long
##
## REML criterion at convergence: 1081.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.06994 -0.90131 0.06774 0.79287 1.76036
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject_id (Intercept) 0.22128 0.4704
## time0 0.02751 0.1659 -1.00
## Residual 1.72844 1.3147
## Number of obs: 313, groups: subject_id, 65
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 3.80161 0.23509 62.93213 16.171 <2e-16 ***
## condmusic -0.08691 0.34430 62.62945 -0.252 0.8015
## condsport -0.30973 0.33582 62.64078 -0.922 0.3599
## time0 0.20629 0.09584 88.45981 2.152 0.0341 *
## condmusic:time0 0.05640 0.13957 86.65475 0.404 0.6871
## condsport:time0 0.15908 0.13653 87.43059 1.165 0.2471
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndmsc cndspr time0 cndm:0
## condmusic -0.683
## condsport -0.700 0.478
## time0 -0.837 0.571 0.586
## condmsc:tm0 0.575 -0.837 -0.402 -0.687
## cndsprt:tm0 0.587 -0.401 -0.838 -0.702 0.482
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Backwards, by age
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: backwards ~ cond * age_in_yr5 + (age_in_yr5 | subject_id)
## Data: digit_span_long
##
## REML criterion at convergence: 1081.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.1004 -0.8688 0.0534 0.8110 1.8042
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject_id (Intercept) 0.61809 0.7862
## age_in_yr5 0.02715 0.1648 -1.00
## Residual 1.72753 1.3144
## Number of obs: 313, groups: subject_id, 65
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 3.39794 0.39093 64.93891 8.692 1.74e-12 ***
## condmusic -0.15422 0.57206 64.77860 -0.270 0.7883
## condsport -0.53150 0.55745 63.95860 -0.953 0.3440
## age_in_yr5 0.20946 0.09326 88.18342 2.246 0.0272 *
## condmusic:age_in_yr5 0.04631 0.13628 87.71905 0.340 0.7348
## condsport:age_in_yr5 0.14042 0.13324 87.98418 1.054 0.2948
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndmsc cndspr ag_n_5 cndm:__5
## condmusic -0.683
## condsport -0.701 0.479
## age_in_yr5 -0.944 0.645 0.662
## cndmsc:g__5 0.646 -0.944 -0.453 -0.684
## cndsprt:__5 0.661 -0.451 -0.944 -0.700 0.479
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Results
texreg::htmlreg(list(m1ft, m1fa, m1bt, m1ba),
custom.model.names = c("Forwards by time", "Forwards by age",
"Backwards by time", "Backwards by age"),
caption = "Results of Growth Modeling")| Forwards by time | Forwards by age | Backwards by time | Backwards by age | ||
|---|---|---|---|---|---|
| (Intercept) | 7.37*** | 6.40*** | 3.80*** | 3.40*** | |
| (0.39) | (0.68) | (0.24) | (0.39) | ||
| condmusic | -0.05 | -0.06 | -0.09 | -0.15 | |
| (0.56) | (0.99) | (0.34) | (0.57) | ||
| condsport | 0.01 | 0.00 | -0.31 | -0.53 | |
| (0.55) | (0.96) | (0.34) | (0.56) | ||
| time0 | 0.52** | 0.21* | |||
| (0.19) | (0.10) | ||||
| condmusic:time0 | 0.01 | 0.06 | |||
| (0.27) | (0.14) | ||||
| condsport:time0 | 0.11 | 0.16 | |||
| (0.26) | (0.14) | ||||
| age_in_yr5 | 0.52** | 0.21* | |||
| (0.18) | (0.09) | ||||
| condmusic:age_in_yr5 | -0.00 | 0.05 | |||
| (0.26) | (0.14) | ||||
| condsport:age_in_yr5 | 0.06 | 0.14 | |||
| (0.26) | (0.13) | ||||
| AIC | 1368.34 | 1369.59 | 1101.10 | 1101.77 | |
| BIC | 1405.27 | 1406.53 | 1138.56 | 1139.23 | |
| Log Likelihood | -674.17 | -674.79 | -540.55 | -540.88 | |
| Num. obs. | 297 | 297 | 313 | 313 | |
| Num. groups: subject_id | 62 | 62 | 65 | 65 | |
| Var: subject_id (Intercept) | 0.16 | 1.46 | 0.22 | 0.62 | |
| Var: subject_id time0 | 0.21 | 0.03 | |||
| Cov: subject_id (Intercept) time0 | -0.18 | -0.08 | |||
| Var: Residual | 4.91 | 4.97 | 1.73 | 1.73 | |
| Var: subject_id age_in_yr5 | 0.19 | 0.03 | |||
| Cov: subject_id (Intercept) age_in_yr5 | -0.53 | -0.13 | |||
| p < 0.001, p < 0.01, p < 0.05 | |||||
But what does it mean??? This is a nice table that puts all 4 models above together.
A simple correlation
##
## Pearson's product-moment correlation
##
## data: music$WASI and music$Y4_Total_AccStandard
## t = -1.5745, df = 17, p-value = 0.1338
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6978707 0.1163155
## sample estimates:
## cor
## -0.3567409
How to report
"In the music group, WASI scores and flanker accuracy were not significantly correlated, r(16[this is your n]) = -0.36, p > 0.05.
Thats great, but not really that helpful.
ALISON HELP! how do we get the p values and stuff in a nice little tables and stuff like that
### Adding a Co-Variate (partial correlation)
ALISON HELP!
ALISON, HELP AGAIN!
please see the accompanying R notebook “Statistical Learnining.rmd”, curated by Priscilla. In the meantime, here are a few things.
lm.f <- lm(WASI ~ AccStandard , data = flanker_long) #to run a regression, notice the ~.
lm.f #prints the coeffecients##
## Call:
## lm(formula = WASI ~ AccStandard, data = flanker_long)
##
## Coefficients:
## (Intercept) AccStandard
## 101.275 3.842
##
## Call:
## lm(formula = WASI ~ AccStandard, data = flanker_long)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.116 -12.845 0.489 9.406 51.545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 101.275 2.139 47.352 <2e-16 ***
## AccStandard 3.842 3.071 1.251 0.212
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.24 on 421 degrees of freedom
## Multiple R-squared: 0.003703, Adjusted R-squared: 0.001337
## F-statistic: 1.565 on 1 and 421 DF, p-value: 0.2116
What does the Multiple R-squared stand for?
In general, R squared is the percentage of your response variable variation explained by the model, or how close data are to a regression line. Measures the amount of variation in the response variable that can be explained by the predictor variables. When you add predictors to your model,the multiple Rsquared will always increase, as a predictor will always explain some portion of the variance.
Adjusted Rsquared controls against this increase, and adds penalties for the number of predictors in the model.Therefore it shows a balance between the most parsimonious model,and the best fitting model. Generally, if you have a large difference between your multiple and your adjusted Rsquared that indicates you may have overfit your model.
how do we interpret the slope for height? if WASI increases by 1,accuracy increases by 3.532
do we reject null hypothesis?
No.
How do we interpret the intercept? Is the intercept meaningful in this case? When a predictor variable does not have a meaningful 0, it is often preferable to center the predictor variable around its mean.
when WASI is 0, acc is 101.557. WASI will never be zero here though. or.. very unlikely. so it doesnt rly mean antyhing
We can center it around the mean
##
## Call:
## lm(formula = wasi2 ~ AccStandard, data = flanker_long)
##
## Coefficients:
## (Intercept) AccStandard
## -2.406 3.842
##
## Call:
## lm(formula = wasi2 ~ AccStandard, data = flanker_long)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.116 -12.845 0.489 9.406 51.545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.406 2.139 -1.125 0.261
## AccStandard 3.842 3.071 1.251 0.212
##
## Residual standard error: 19.24 on 421 degrees of freedom
## Multiple R-squared: 0.003703, Adjusted R-squared: 0.001337
## F-statistic: 1.565 on 1 and 421 DF, p-value: 0.2116
Confidence interval
## 2.5 % 97.5 %
## (Intercept) -6.609924 1.798107
## AccStandard -2.194658 9.877683
basically this means that the reference group is the first level. the intercept is the first group. the second one is comparing x level to the reference group.
You have to be careful about what you choose as your reference group. Sometimes it might not even make sense to have a reference group (for example, if you’re comparing genders etc.) You might just want to use the largest group as the reference group in that case. But, something to be aware of.
##
## Call:
## lm(formula = AccStandard ~ Group, data = flanker_long)
##
## Coefficients:
## (Intercept) GroupSport GroupControl
## 0.7160 -0.1205 -0.1507
##
## Call:
## lm(formula = AccStandard ~ Group, data = flanker_long)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70292 -0.23952 0.06321 0.27438 0.43463
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.71602 0.02488 28.776 < 2e-16 ***
## GroupSport -0.12045 0.03577 -3.367 0.000829 ***
## GroupControl -0.15065 0.03519 -4.281 2.31e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2986 on 420 degrees of freedom
## Multiple R-squared: 0.04638, Adjusted R-squared: 0.04184
## F-statistic: 10.21 on 2 and 420 DF, p-value: 4.663e-05
This is a way of computing anova-like stats based in regression methods. It allows for baseline variability, which is pretty great.
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: AccStandard ~ Group * Year + (1 | ID)
## Data: flanker_long
##
## REML criterion at convergence: 172.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9280 -0.7791 0.1746 0.7776 1.9086
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.0001739 0.01319
## Residual 0.0815577 0.28558
## Number of obs: 423, groups: ID, 47
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.72857 0.04135 264.83967 17.619 < 2e-16 ***
## GroupSport -0.19581 0.05945 264.83967 -3.294 0.00112 **
## GroupControl -0.27478 0.05848 264.83967 -4.699 4.21e-06 ***
## YearY4 -0.16458 0.05829 369.99930 -2.823 0.00501 **
## YearY5 0.12694 0.05829 369.99930 2.178 0.03007 *
## GroupSport:YearY4 0.25540 0.08380 369.99931 3.048 0.00247 **
## GroupControl:YearY4 0.26764 0.08244 369.99930 3.246 0.00128 **
## GroupSport:YearY5 -0.02932 0.08380 369.99931 -0.350 0.72666
## GroupControl:YearY5 0.10476 0.08244 369.99930 1.271 0.20464
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) GrpSpr GrpCnt YearY4 YearY5 GS:YY4 GC:YY4 GS:YY5
## GroupSport -0.696
## GroupContrl -0.707 0.492
## YearY4 -0.705 0.490 0.498
## YearY5 -0.705 0.490 0.498 0.500
## GrpSprt:YY4 0.490 -0.705 -0.347 -0.696 -0.348
## GrpCntr:YY4 0.498 -0.347 -0.705 -0.707 -0.354 0.492
## GrpSprt:YY5 0.490 -0.705 -0.347 -0.348 -0.696 0.500 0.246
## GrpCntr:YY5 0.498 -0.347 -0.705 -0.354 -0.707 0.246 0.500 0.492
assumptions
if interaction effect… you can look@ groups individually
This model DOES NOT rely on standard assumptions as lmer does. Documentation and simulation studies can be found here, from Koller 2016.
This package is very new. bootmer, MCMC, and Kenward-Roger approxmitions for degrees of freedom and p values are not yet configured and cannot be computed on any other program that i am aware of. But! I can write my own function! working on that.
## Robust linear mixed model fit by DAStau
## Formula: AccStandard ~ Group * Year + (1 | ID)
## Data: flanker_long
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8172 -0.7329 0.1446 0.6869 1.8206
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.00000 0.000
## Residual 0.09427 0.307
## Number of obs: 423, groups: ID, 47
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.74271 0.04545 16.340
## GroupSport -0.20671 0.06534 -3.163
## GroupControl -0.30292 0.06428 -4.712
## YearY4 -0.16726 0.06428 -2.602
## YearY5 0.13849 0.06428 2.154
## GroupSport:YearY4 0.27355 0.09241 2.960
## GroupControl:YearY4 0.29206 0.09091 3.213
## GroupSport:YearY5 -0.03246 0.09241 -0.351
## GroupControl:YearY5 0.12181 0.09091 1.340
##
## Correlation of Fixed Effects:
## (Intr) GrpSpr GrpCnt YearY4 YearY5 GS:YY4 GC:YY4 GS:YY5
## GroupSport -0.696
## GroupContrl -0.707 0.492
## YearY4 -0.707 0.492 0.500
## YearY5 -0.707 0.492 0.500 0.500
## GrpSprt:YY4 0.492 -0.707 -0.348 -0.696 -0.348
## GrpCntr:YY4 0.500 -0.348 -0.707 -0.707 -0.354 0.492
## GrpSprt:YY5 0.492 -0.707 -0.348 -0.348 -0.696 0.500 0.246
## GrpCntr:YY5 0.500 -0.348 -0.707 -0.354 -0.707 0.246 0.500 0.492
##
## Robustness weights for the residuals:
## 345 weights are ~= 1. The remaining 78 ones are summarized as
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.477 0.753 0.872 0.857 0.968 0.998
##
## Robustness weights for the random effects:
## All 47 weights are ~= 1.
##
## Rho functions used for fitting:
## Residuals:
## eff: smoothed Huber (k = 1.345, s = 10)
## sig: smoothed Huber, Proposal II (k = 1.345, s = 10)
## Random Effects, variance component 1 (ID):
## eff: smoothed Huber (k = 1.345, s = 10)
## vcp: smoothed Huber, Proposal II (k = 1.345, s = 10)
With more than 2 time points, we can incorporate a robust latent growth analysis, so that each participant is allowed an individual trajectory over time. Field and Wilcox, 2017. Incorporating the growth model also gives us p values and confidence intervals!
I am still learning how to use this!! Come back for complete notes.
# library(lavaan)
# #ideally, your first point (here is Year3), is baseline. but for illustration purposes. here we are
#
# myModel3 <- '
# i =~ 1*Year3 + 1*Year4 + 1*Year5
# s =~ 0*Year3 + 2*Year4 + 8*Year5
#
# #variances/covariances
# i ~~ i
# s ~~ s
# i ~~ s
#
# #intercepts
# i ~ 1
# s ~ 1
#
# Year3 ~ 0
# Year 4 ~ 0
# Year 5 ~ 0
#
# i + s ~ Group
#
# rctFit <- growth(myModel3, data = flanker_long)
# summary(rctFit)Coming soon!
Also coming soon!
NOTE: this is from an earlier version of this notebook. Feel free to study the below non-parametric tests. But again, these are still considered “classic methods” and have considerable pitfalls. Please use caution. I recommend deep-diving into Rand’s book from the link above. Again, this is not a stats course.
ALL of the above tests make several assumptions about the data used:
1. Normality: data have a normal distribution
2. Homogeneity of Variance: variances are the same between groups
3. Linearity: data have a linear relationship
4. Independence: data are independent
However, this may not be true for all data you may encounter. Below are some commonly-used non-parametric tests for those instances that the assumptions above are violated.
This test is a rank-based test. So, instead of raw scores, data is transformed into ranks. It begins by calculating the difference between “observed values” and the “default value”. This if for data that is INTERVAL or RATIO, not ORDINAL. This is additionally only for one-sample data. Note: the default value is something like a middle, neutral value, or a previously published value.
The null hypothesis is that the data is symmetric around the “default value”, and thus two samples are identical populations.
Parametric equivalent: paired t-test
##
## Wilcoxon signed rank test with continuity correction
##
## data: digit_span$age_1 and digit_span$age_2
## V = 0, p-value = 1.19e-12
## alternative hypothesis: true location shift is not equal to 0
## [1] 82
## [1] 94
What does it mean?
V corresponds to the sum of the ranks assigned to the differences.
I’m not really sure how to interpret this. I don’t understand this test well enough. To be learned!!!
How to report
“A Wilcoxen Signed-Ranks test indicated that age at year 2 (Mdn = 94) was greater than age at year 1 (Mdn = 82), V = 0, p < 0.001.”
The parametric equivalent to this one is a 1-way ANOVA
Null hypothesis is that groups are equal on some dependent variable.
We will use our flanker data to explore if groups differ on accuracy
##
## Kruskal-Wallis rank sum test
##
## data: AccStandard by Group
## Kruskal-Wallis chi-squared = 24.274, df = 2, p-value = 5.359e-06
How to report
“There was a signifcant difference between groups on accuracy (H(2) = 24.274, p < 0.001)”
Parametric equivalent: Repeated Measures ANOVA
##
## Friedman rank sum test
##
## data: cbind(flanker_long$WASI, flanker_long$Group, flanker_long$Year)
## Friedman chi-squared = 692.18, df = 2, p-value < 2.2e-16
What does it mean?
Again, I don’t really understand how this one works. To be learned!
How to report
“A non-parametric Friedman test of differences among repeated measures was conducted and rendered a Chi-square value of 692.18, which was significant (p < 0.001).”
Now, to the fun part. R is SO SO GREAT for doing data visualization. It takes a little getting used to, but then, your options are literally limitless. Below is some of what you can do but, as always, there is so much more we have yet to learn.
Let’s start with tables, since we’ve been doing so much with data above.
Kable is used to make rly pretty tables in html format. These look nice on knitted r markdowns like this one. However, I don’t think you can put them in grobs (below), so we will continue with grob-style tables later.
TBH……. the document below does a better job of explaining kable than I ever will. So… here is the link!
but, in the meantime, here is a simple example:
| Group | Condition | Year | N | mean | sd | se |
|---|---|---|---|---|---|---|
| Music | Congruent | Y3 | 16 | 683.4739 | 131.25684 | 32.81421 |
| Music | Congruent | Y4 | 16 | 603.9375 | 125.82827 | 31.45707 |
| Music | Congruent | Y5 | 16 | 577.2344 | 127.44079 | 31.86020 |
| Music | Incongruent | Y3 | 16 | 750.6588 | 102.77855 | 25.69464 |
| Music | Incongruent | Y4 | 16 | 651.1042 | 145.37000 | 36.34250 |
| Music | Incongruent | Y5 | 16 | 619.1771 | 133.76510 | 33.44127 |
| Music | Neutral | Y3 | 16 | 723.4173 | 119.32385 | 29.83096 |
| Music | Neutral | Y4 | 16 | 547.7054 | 125.41262 | 31.35315 |
| Music | Neutral | Y5 | 16 | 503.1964 | 121.37974 | 30.34494 |
| Sport | Congruent | Y3 | 15 | 689.5363 | 139.11747 | 35.91998 |
| Sport | Congruent | Y4 | 15 | 695.0056 | 117.13139 | 30.24319 |
| Sport | Congruent | Y5 | 15 | 624.7333 | 111.77598 | 28.86043 |
| Sport | Incongruent | Y3 | 15 | 740.8833 | 91.40338 | 23.60025 |
| Sport | Incongruent | Y4 | 15 | 749.3544 | 179.26647 | 46.28640 |
| Sport | Incongruent | Y5 | 15 | 668.4333 | 100.05226 | 25.83338 |
| Sport | Neutral | Y3 | 15 | 771.7964 | 109.38517 | 28.24313 |
| Sport | Neutral | Y4 | 15 | 609.0857 | 110.96122 | 28.65006 |
| Sport | Neutral | Y5 | 15 | 525.0952 | 78.39185 | 20.24069 |
| Control | Congruent | Y3 | 16 | 667.3185 | 105.34082 | 26.33520 |
| Control | Congruent | Y4 | 16 | 599.5625 | 87.32704 | 21.83176 |
| Control | Congruent | Y5 | 16 | 533.2031 | 64.51687 | 16.12922 |
| Control | Incongruent | Y3 | 16 | 724.1630 | 99.52525 | 24.88131 |
| Control | Incongruent | Y4 | 16 | 639.4479 | 132.49195 | 33.12299 |
| Control | Incongruent | Y5 | 16 | 565.7188 | 80.83267 | 20.20817 |
| Control | Neutral | Y3 | 16 | 693.0483 | 119.40328 | 29.85082 |
| Control | Neutral | Y4 | 16 | 545.7946 | 109.17146 | 27.29287 |
| Control | Neutral | Y5 | 16 | 470.0804 | 91.85918 | 22.96479 |
again, you can do all sorts of cool stuff like grouping by a variable, coloring, footnotes, etc etc etc. look @ that little page. it is great.
This is fun. But really, it is better when you are making sweave + pdf type documents.
| Group | Condition | Year | N | mean | sd | se |
| Music | Congruent | Y3 | 16 | 683.474 | 131.257 | 32.814 |
| Music | Congruent | Y4 | 16 | 603.938 | 125.828 | 31.457 |
| Music | Congruent | Y5 | 16 | 577.234 | 127.441 | 31.860 |
| Music | Incongruent | Y3 | 16 | 750.659 | 102.779 | 25.695 |
| Music | Incongruent | Y4 | 16 | 651.104 | 145.370 | 36.342 |
| Music | Incongruent | Y5 | 16 | 619.177 | 133.765 | 33.441 |
| Music | Neutral | Y3 | 16 | 723.417 | 119.324 | 29.831 |
| Music | Neutral | Y4 | 16 | 547.705 | 125.413 | 31.353 |
| Music | Neutral | Y5 | 16 | 503.196 | 121.380 | 30.345 |
| Sport | Congruent | Y3 | 15 | 689.536 | 139.117 | 35.920 |
| Sport | Congruent | Y4 | 15 | 695.006 | 117.131 | 30.243 |
| Sport | Congruent | Y5 | 15 | 624.733 | 111.776 | 28.860 |
| Sport | Incongruent | Y3 | 15 | 740.883 | 91.403 | 23.600 |
| Sport | Incongruent | Y4 | 15 | 749.354 | 179.266 | 46.286 |
| Sport | Incongruent | Y5 | 15 | 668.433 | 100.052 | 25.833 |
| Sport | Neutral | Y3 | 15 | 771.796 | 109.385 | 28.243 |
| Sport | Neutral | Y4 | 15 | 609.086 | 110.961 | 28.650 |
| Sport | Neutral | Y5 | 15 | 525.095 | 78.392 | 20.241 |
| Control | Congruent | Y3 | 16 | 667.319 | 105.341 | 26.335 |
| Control | Congruent | Y4 | 16 | 599.562 | 87.327 | 21.832 |
| Control | Congruent | Y5 | 16 | 533.203 | 64.517 | 16.129 |
| Control | Incongruent | Y3 | 16 | 724.163 | 99.525 | 24.881 |
| Control | Incongruent | Y4 | 16 | 639.448 | 132.492 | 33.123 |
| Control | Incongruent | Y5 | 16 | 565.719 | 80.833 | 20.208 |
| Control | Neutral | Y3 | 16 | 693.048 | 119.403 | 29.851 |
| Control | Neutral | Y4 | 16 | 545.795 | 109.171 | 27.293 |
| Control | Neutral | Y5 | 16 | 470.080 | 91.859 | 22.965 |
Grid Extra is the library we use to make these tables for grobs (below). It basically turns the tables into little graphics
here is a comprehensive guide of the asthetic formatting of such tables
#first, set the theme. There are many dif ways to do this, obviously, but here is one
tt <- ttheme_minimal(colhead = list(fg_params = list(col= "white"),
bg_params = list(fill = "grey70")))
tbl <-tableGrob(flankersum_rt, rows = NULL, theme = tt)
#now we print it by doing this!
grid.arrange(tbl)You can also do it from stats
t1 <- t.test(music$WASI, control$WASI)
tab <- map_df(list(t1),tidy)
tab <- tab[c("statistic", "p.value", "conf.low", "conf.high")]
colnames(tab) = c('t-statistic', "p.value", "conf.low","conf.high")
#make it pretty
tt2 <- ttheme_minimal(base_size =10, colhead = list(fg_params = list(col= "white"),
bg_params = list(fill = "grey70")))
tbl2 <-tableGrob(tab, rows = NULL, theme = tt2)
grid.arrange(tbl2)makes a cute little document that is formatted perfectly for APA!
There is a lot more you can do, but here are a few simple things.
For the purposes of demonstration, I have included the images of what the output looks like. (they are also .doc files in the directory)
Correlation table!!!!
Regression table!!
basic.reg <- lm(AccStandard ~ Group + Year + Condition, data = flanker_long)
apa.reg.table(basic.reg, filename = "APA_reg.doc", table.number = 2)1 way ANOVA table!!
thisanova <- (aov(AccStandard ~ Year, data = flanker_long))
apa.aov.table(thisanova, filename = "APA_anova.doc", table.number =3)Descriptive stats to follow that up with.
apa.1way.table(iv = Year, dv = AccStandard, data = flanker_long,
filename = "APA_anovadescriptives.doc",
table.number = 5)To do a table from EZanova use “apa.ezANOVA.table”
Histograms are like bar graphs, but they connect. They are good for looking at things like distributions, normality, etc. We are going to use them to look at WASI scores, here.
You can use wide format data for these.
myplot = ggplot(flanker_standard, aes(x=WASI, y = Age, color = Group)) +
geom_point() +
geom_smooth(method=lm)
print(myplot)Let’s start by looking just at WASI Scores Overall
plot1 = ggplot(flanker_standard, aes(x = flanker_standard$WASI)) +
geom_histogram(aes(y = ..density..), binwidth = 1, fill = 'red', alpha = 0.5) +
geom_density(colour = 'blue') + xlab(expression(bold('WASI Score'))) +
ylab(expression(bold('Density')))
print(plot1)Another way, with colors
barfill <- "gold1"
barlines <- "goldenrod2"
plot2 = ggplot(flanker_standard, aes(x= flanker_standard$WASI)) +
geom_histogram(aes(y = ..density..), binwidth = 1,
colour = barlines, fill = barfill) +
scale_x_continuous(name = "WASI\nScore") +
scale_y_continuous(name = "Density") +
ggtitle ("WASI Scores") +
stat_function(fun = dnorm, colour = "red",
args = list(mean = mean(flanker_standard$WASI, na.rm = TRUE),
sd = sd(flanker_standard$WASI, na.rm = TRUE)))
print(plot2)With groups
plot3 =ggplot(flanker_standard) +
geom_density(aes(x = flanker_standard$WASI, fill = Group), alpha = 0.5) +
scale_x_continuous(name = "WASI\nScore") +
scale_y_continuous(name = "Density") +
ggtitle ("WASI Scores") +
theme_bw() +
theme(axis.line = element_line(size = 1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
plot.title = element_text(size = 14, family = "Times", face = "bold", hjust = 0.5),
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9)) +
scale_fill_brewer(palette = "Pastel2") +
labs(fill = "Group")
print(plot3)#relabel your grouping variable
flanker_standard$Group.f = factor(flanker_standard$Group, labels = c("Music","Sport", "Control"))
plot3 =ggplot(flanker_standard) +
geom_density(aes(x = flanker_standard$WASI, fill = Group.f), alpha = 0.5) +
scale_x_continuous(name = "WASI\nScore") +
scale_y_continuous(name = "Density") +
ggtitle ("WASI Scores") +
theme_bw()+
theme(axis.line = element_line(size = 1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
plot.title = element_text(size = 14, family = "Times", face = "bold", hjust = 0.5),
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9)) +
scale_fill_brewer(palette = "Pastel2") +
labs(fill = "Group")
print(plot3)These are helpful for looking at outliers within each group.
Here is the most basic:
plot(WASI~Group, data=flanker_standard,
names = c("Music", "Sport", "Control"),
main = "WASI Scores",
ylab = "WASI")But, we like using ggplot2. So here is how to make that plot in ggplot
plot4 = ggplot(flanker_standard, aes(x = Group.f, y = WASI))+
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("Group")
print(plot4)Another kind
plot6 = ggplot(data = flanker_standard, mapping = aes(x = Group, y = WASI)) +
geom_boxplot(alpha = 0) +
geom_jitter(alpha = 0.3, color = "Blue")
print(plot6)Categorical
Let’s do a chart for gender. Basic basic
ggplot(flanker_standard, aes(x = Group.f, fill = Gender)) +
geom_bar(position = position_dodge(.35), width = .5)pd <- position_dodge(.35)
ggplot(flanker_standard, aes(x = Group.f, fill = Gender)) +
geom_bar(position = position_dodge(.35), width = .5)+
expand_limits(y=10) +
scale_y_continuous(breaks=0:60*1) +
xlab("Group") +
theme_bw()+
theme(axis.line = element_line(size = 1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10)) +
scale_fill_brewer(palette = "Pastel2") With percentages
We are going to use fake data for this
Non-Categorical
#helps so so much to use a summary table. we did this above. let's do one for WASI.
wasisum = ddply(flanker_long, c("Group"), summarise,
N = length(WASI), # calculates # of individuals per group by year
mean = mean(WASI), # calculates mean for each group by year
sd = sd(WASI), # calculates sd for each group by year
se = sd / sqrt(N))
#relabel the groups for easier visualization
wasisum$Group = factor(c("Music","Sport","Control"))
pd <- position_dodge(.35)
ggplot(wasisum, aes(x = Group, y = mean, fill =Group)) +
geom_bar(position = position_dodge(.35), width = .5, stat = "identity")+
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="black", width=.1, position=pd)+
ylab("Accuracy") +
expand_limits(y=100) +
scale_y_continuous(breaks=0:100*10) +
theme_bw()+
theme(axis.line = element_line(size = 1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10)) +
scale_fill_brewer(palette = "Dark2")Great. Now, let’s look at some between conditions.
#subset for 1 year
flankersum_3 = flankersum_acc[flankersum_acc$Year =="Y5",]
pd <- position_dodge(0.7)
ggplot(flankersum_3, aes(x=Condition, y=mean, fill=Group)) +
geom_bar(position=position_dodge(0.7), width = .8,stat= "identity", size = 2) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="gray43", width=.1, position=pd) +
xlab("Condition") +
ylab("% Accuracy") +
theme_bw() +
scale_fill_brewer(palette = "Accent", #changes colour for group
name="Group",
breaks=c('1', '2', '3'),
labels=c('Music', 'Sport', 'Control')) + #changes label for group
theme(axis.line = element_line(size = 1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10))And… between years and conditions
ggplot(flankersum_acc, aes(x=Group, y=mean, fill=Condition)) +
geom_bar(position=pd, width = .8,stat= "identity", size = 2) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="gray43", width=.1, position=pd) +
xlab("Group") +
ylab("% Accuracy") +
theme_bw() +
scale_fill_brewer(palette = "Accent")+ #changes color for condition
scale_x_discrete(breaks = c("1","2","3"),
labels = c("Music","Sport","Control"))+ #changes label for groups
theme(axis.line = element_line(size = 1, colour = "black"),
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10)) +
facet_grid(. ~Year, scales = "free") Reeeally quickly, lets just look at some paired data, and how we switched left and right, as appropriate. This is cortical thickness data, with Left & Right hemispheres. This is actually a summary spreadsheet already, saved as an xlsx.
## New names:
## * `` -> ...1
pd <- position_dodge(.68)
ggplot(ctsum, aes(x=Group, y=mean, fill=Side)) +
geom_bar(position=pd, width = .8,stat= "identity", size = 2) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="gray43", width=.1, position=pd) +
xlab("Group") +
ylab("AC") +
ggtitle("CT: AC") +
scale_fill_brewer(palette = "Pastel2",
direction = "-1", #THIS switched directions. otherwise left would be on the right, etc.
name="Side",
breaks=c('Left', 'Right'),
labels=c('Left', 'Right')) +
expand_limits(y=3.5) +
scale_y_continuous(breaks=0:60*1) +
theme_bw() +
theme(legend.justification=c(1,0),
legend.position=c(.98,.72),
text = element_text(family = "Times"),
plot.title = element_text (hjust = .5),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10),
legend.title.align = .5,
legend.box.background = element_rect(colour = "gray43", size = .75))Another type: This is Assal’s preferred scheme
Note the manual changes made
pd <- position_dodge(0.6)
ggplot(flankersum_acc, aes(x = Group, y = mean, fill =Condition)) +
geom_bar(position = position_dodge(0.6), width = .9, stat = "identity", alpha = 0.5)+ #alpha level changes transparency
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="white", width=.1, position=pd)+
ylab("Accuracy") +
ggtitle("Flanker Fish") +
scale_x_discrete(breaks = c("1","2","3"),
labels = c("Music", "Sport","Control"))+ #Rename your ticks!
scale_fill_manual(name = "Condition",
breaks = c("Congruent","Incongruent", "Neutral"), #what they ARE called
labels = c("Congruent!", "Incongruent!", "Neutral!"), #what you WANT THEM TO BE called
values = c("yellow","cyan", "pink"))+ #choose your own colors!
theme_bw()+
theme(axis.line = element_line(size = 1, colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(),
text = element_text(family = "Times", colour = "white"),
plot.background = element_rect(fill = "black", colour = "black"),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.key = element_rect(fill = "black"),
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(colour = "white", size = 10, angle = 45), #angle makes an angle!
axis.text.y = element_text(colour = "white", size = 10))+
facet_grid(. ~Year, labeller = as_labeller(c("Y3" = "Year 3", "Y4" = "Year 4", "Y5" = "Year 5")), scales = "free") + #change names of facets
theme(strip.background = element_rect(fill = "gray28"), #change the colors of your facet boxes
strip.text = element_text(colour = "white")) #change text of facets Now, onto line graphs! yay! Line graphs are really great at showing how things change over time.
Let’s start with a simple line graph. All groups, over time
#make a summary table of the totals
#make a subset
totalflanker = fullflanker_long[fullflanker_long$Condition =="Total",]
#make a summary
totalsum = ddply(totalflanker, c("Group","Year"), summarise,
N = length(RTStandard), # calculates # of individuals per group by year
mean = mean(RTStandard), # calculates mean for each group by year
sd = sd(RTStandard), # calculates sd for each group by year
se = sd / sqrt(N))
#refactor
totalsum$Year =as.factor(totalsum$Year)
pd <- position_dodge(0.2)
ggplot(totalsum, aes(x=Year, y=mean, colour=Group, group = Group)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="gray43", width=.1, position=position_dodge(0.2)) +
geom_line(position=pd, size = 2) +
geom_point(position=pd, size=2, shape=21, fill="white") +
xlab("Year") +
ylab("RT (ms)") +
ggtitle("RT over time") +
scale_colour_brewer(palette = "Accent",
name="Group",
breaks=c('1', '2', '3'),
labels=c('Music', 'Sport', 'Control')) +
theme_bw() +
theme(legend.position= "right",
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10, family = "Times"),
axis.text.y = element_text(colour = "black", size = 10, family = "Times"),
plot.title = element_text(colour = "black", size = 14, hjust = 0.5, family = "Times"),
legend.title.align = 0.5,
legend.box.background = element_rect(colour = "gray43", size = .75))pd <- position_dodge(0.2)
ggplot(flankersum_acc, aes(x=Year, y=mean, colour=Group, group = Group)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), colour="gray43", width=.1, position=pd) +
geom_line(position=pd, size = 2) +
geom_point(position=pd, size=2, shape=21, fill="white") +
xlab("Year") +
ylab("RT (ms)") +
ggtitle("RT over time") +
scale_colour_brewer(palette = "Accent",
name="Group",
breaks=c('1', '2', '3'),
labels=c('Music', 'Sport', 'Control')) +
scale_x_discrete(breaks = c("Y3","Y4","Y5"), # add this part in if you want to label "Y3" as just "3"
labels = c("3","4","5"))+
theme_bw() +
theme(legend.position= "right",
text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10, family = "Times"),
axis.text.y = element_text(colour = "black", size = 10, family = "Times"),
plot.title = element_text(colour = "black", size = 14, hjust = 0.5, family = "Times"),
legend.title.align = 0.5,
legend.box.background = element_rect(colour = "gray43", size = .75))+
facet_grid(. ~Condition, scales = "free") Groups by color
digit_span$group = as.factor(digit_span$group)
ggplot(digit_span, aes(x = age_1, y = age_5, color = group)) +
geom_point()+
geom_smooth(method=lm) +
scale_colour_brewer(palette ="Spectral",
direction =-1,
name="Group",
breaks=c('1', '2', '3'),
labels=c('Music', 'Sport', 'Control'))## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
Gradient
ggplot(digit_span, aes(age_1, age_5, colour = age_1)) +
geom_point(shape = 16, size = 7, show.legend = FALSE)+
theme_minimal() +
scale_color_gradient(low = "#0991ff", high = "#f0650e") +
theme(text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10))## Warning: Removed 1 rows containing missing values (geom_point).
That’s cool, but they’re all stacked! Let’s add a jitter
plotx = ggplot(digit_span, aes(age_1, age_5, colour = age_1)) +
geom_jitter(shape = 16, size = 7, show.legend = FALSE) +
theme_minimal() +
scale_color_gradient(low = "#0991ff", high = "#f0650e") +
theme(text = element_text(family = "Times"),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10))
print(plotx)## Warning: Removed 1 rows containing missing values (geom_point).
Correlation Matrix
Because, as you can see above, we have terrible data for correlations, we are using mtcars ( a built in data set)
mydata <- mtcars[, c(1,3,4,5,6,7)]
cormat <- round(cor(mydata),2)
melted_cormat <- melt(cormat)
ggplot(data = melted_cormat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile() add more information
# Get lower triangle of the correlation matrix
get_lower_tri<-function(cormat){
cormat[upper.tri(cormat)] <- NA
return(cormat)
}
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}
upper_tri <- get_upper_tri(cormat)
#drop the nas
melted_cormat <- melt(upper_tri, na.rm = TRUE)
ggheatmap <- ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "#0991ff", high = "#f0650e", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()
print(ggheatmap) Great. Now let’s put some numbers in it.
ggheatmap +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))myModel_ef1 <- effect(term="Group*Year", mod=myModel)
efdata1<-as.data.frame(myModel_ef1) #convert the effects list to a data frame
ggplot(efdata1, aes(x=Year, y=fit, color=Group,group=Group)) +
geom_point() +
geom_line(size=1.2) +
geom_ribbon(aes(ymin=fit-se, ymax=fit+se, fill=Group),alpha=0.3) +
labs(title = "your title here", x= "Time", y="Average", color="Group", fill="Group") + theme_classic() + theme(text=element_text(size=20))#Setup
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
#super basic plot
ggplot(data = world) +
geom_sf() +
xlab("Longitude") + ylab("Latitude") +
ggtitle("World map", subtitle = paste0("(", length(unique(world$subunit)), " countries)"))Add populations and colors
ggplot(data = world) +
geom_sf(aes(fill = pop_est)) +
scale_fill_viridis_c(option = "plasma", trans = "sqrt")Using the coordinate system
ggplot(data = world) +
geom_sf() +
coord_sf(crs = "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs ")There is a lot more you can do. Here is the link for some of it (click around the site)
Grobs are this cool way of displaying both tables and graphs together.
Really, you can display whatever you want together. Even pictures, I think. I’m pretty sure this works best for ggplots, though. I could be wrong.
Here are two graphs side by side
switch them up
add more.
With a table
(obviously the arrangement below doesn’t make any sense at all, but the sentiment is there.)